Common Crawl web graph

ID: common-crawl-web-graph

Common Crawl web graph by Ciro Santilli 35 Updated +Created
In 2017 apparently they've started making their own Web Graphs, i.e. they parse the HTML and extract the graph of what links to what.
This is exactly what we need for an open implementation of PageRank.
Edit: actually, they already calculate PageRank for us!!! Fantastic!!! Main section: Section "Common Crawl web graph official PageRank".
The graphs are dumped in BVGraph format.
Their source code is at: github.com/commoncrawl/cc-webgraph

New to topics? Read the docs here!