In 2017 apparently they've started making their own Web Graphs, i.e. they parse the HTML and extract the graph of what links to what.
Edit: actually, they already calculate PageRank for us!!! Fantastic!!! Main section: Section "Common Crawl web graph official PageRank".
A quick exploration of the graph can be seen at: github.com/cirosantilli/cirosantilli.github.io/issues/198
Their source code is at: github.com/commoncrawl/cc-webgraph
This appears to be the direct precursor project of the Common Crawl web graph official PageRank
This section is about: wwwranking.webdatacommons.org/
Based on Common Crawl 2012, and they don't seem to be updating it regularly...
Created by the Università degli Studi di Milano.
This section is about more "open" PageRank implementations, notably using either or both of:
As of 2025, the most open and reproducible implementation appears to be whatever Common Crawl web graph official PageRank does, which is to use WebGraph. It's quite beautiful.
github.com/cirosantilli/cirosantilli.github.io/issues/198. Previously at: stackoverflow.com/questions/31321009/best-more-standard-graph-representation-file-format-graphson-gexf-graphml/79467334#79467334 but Stack Overflow fucking deleted the question.
My general motivation for this is that a PageRank-like algorithm could be useful for more accurate user and article ranking on OurBigBook, see: Section "PageRank-like ranking"
But it could also be just generally cool to apply it to other graph datasets, e.g. for computing an Wikipedia internal PageRank.
Then I had a look at the Common Crawl web graph data to see if I could easily calculate it myself, and... they already have it! See: Section "Common Crawl web graph official PageRank"
Their graph dumps are in BVGraph graph file format, which is the native format of the WebGraph framework, which implements the format and algorithms such as PageRank.
The only thing I miss is a command line interface to calculate the PageRank. That would be so awesome.
Announcements:
In cc-main-2024-25-dec-jan-feb-domain-ranks.txt:
cirosantilli.com
was ranked ~453kourbigbook.com
was at ~606k