= Quick fun with the Common Crawl web graph
https://stackoverflow.com/questions/31321009/best-more-standard-graph-representation-file-format-graphson-gexf-graphml/79467334#79467334
I wanted to do a quick exploration of <open PageRank implementation and data>.
My general motivation for this is that a <PageRank>-like algorithm could be useful for more accurate user and article ranking on <OurBigBook>, see: <ourbigbook com/PageRank-like ranking>{full}
But it could also be just generally cool to apply it to other <graph> datasets, e.g. for computing an <Wikipedia internal PageRank>.
A quick <Google> reveals only <Open PageRank>, but their methods are apparently closed source.
Then I had a look at the <Common Crawl web graph> data to see if I could easily calculate it myself, and... they already have it! See: <Common Crawl web graph official PageRank>{full}
Their graph dumps are in <BVGraph> <graph file format>, which is the native format of the <WebGraph (software)> framework, which implements the format and algorithms such as <PageRank>.
The only thing I miss is a command line interface to calculate the PageRank. That would be so awesome.
The more I look at it the more I love <Common Crawl>.
Back to article page