= Common Crawl web graph
https://commoncrawl.org/web-graphs
In 2017 apparently they've started making their own Web Graphs, i.e. they parse the HTML and extract the graph of what links to what.
This is exactly what we need for an open implementation of <PageRank>.
Edit: actually, they already calculate <PageRank> for us!!! Fantastic!!! Main section: <Common Crawl web graph official PageRank>{full}.
The graphs are dumped in <BVGraph> format.
A quick exploration of the graph can be seen at: https://stackoverflow.com/questions/31321009/best-more-standard-graph-representation-file-format-graphson-gexf-graphml/79467334#79467334
Their source code is at: https://github.com/commoncrawl/cc-webgraph
Back to article page