Intel department by Ciro Santilli 37 Updated +Created
Intel hardware by Ciro Santilli 37 Updated +Created
Intel employee by Ciro Santilli 37 Updated +Created
École Polytechnique alumnus by year by Ciro Santilli 37 Updated +Created
For example, Ciro Santilli's year started in 2009, though as a foreign student he arrived only at the start of 2010, and Ciro's promotion is usually known just as X09. And as the century barrier is broken we'll start to need to specify as X2009 one day.
Rob Pike by Ciro Santilli 37 Updated +Created
PhD thesis by Ciro Santilli 37 Updated +Created
WebGraph (software) by Ciro Santilli 37 Updated +Created
BVGraph by Ciro Santilli 37 Updated +Created
The native file format of WebGraph.
It is a binary format and highly storage efficient.
TODO meaning of "BV"?
A quick hands-on introduction to the format by Ciro Santilli can be found at: github.com/cirosantilli/cirosantilli.github.io/issues/198
Cancer research by Ciro Santilli 37 Updated +Created
Updates / Quick fun with the Common Crawl web graph by Ciro Santilli 37 Updated +Created
I wanted to do a quick exploration of open PageRank implementation and data.
My general motivation for this is that a PageRank-like algorithm could be useful for more accurate user and article ranking on OurBigBook, see: Section "PageRank-like ranking"
But it could also be just generally cool to apply it to other graph datasets, e.g. for computing an Wikipedia internal PageRank.
A quick Google reveals only Open PageRank, but their methods are apparently closed source.
Then I had a look at the Common Crawl web graph data to see if I could easily calculate it myself, and... they already have it! See: Section "Common Crawl web graph official PageRank"
Their graph dumps are in BVGraph graph file format, which is the native format of the WebGraph framework, which implements the format and algorithms such as PageRank.
The only thing I miss is a command line interface to calculate the PageRank. That would be so awesome.
The more I look at it the more I love Common Crawl.
In cc-main-2024-25-dec-jan-feb-domain-ranks.txt:
  • cirosantilli.com was ranked ~453k
  • ourbigbook.com was at ~606k
White-East asian mixed by Ciro Santilli 37 Updated +Created
Multiracial by Ciro Santilli 37 Updated +Created
White people by Ciro Santilli 37 Updated +Created
Race (human categorization) by Ciro Santilli 37 Updated +Created
Atherton, California by Ciro Santilli 37 Updated +Created
Municipality in San Mateo County by Ciro Santilli 37 Updated +Created
San Mateo County by Ciro Santilli 37 Updated +Created
Municipality in Illinois by Ciro Santilli 37 Updated +Created

There are unlisted articles, also show them or only show them.