OurBigBook About$ Donate
 Sign in+ Sign up
by Ciro Santilli (@cirosantilli, 37)

Common Crawl web graph

 ... Computer Software Search engine Web crawling Open web crawling Common Crawl
 0 By others on same topic  0 Discussions  Updated 2025-05-13  +Created 2025-02-26  See my version
commoncrawl.org/web-graphs
In 2017 apparently they've started making their own Web Graphs, i.e. they parse the HTML and extract the graph of what links to what.
This is exactly what we need for an open implementation of PageRank.
Edit: actually, they already calculate PageRank for us!!! Fantastic!!! Main section: Section "Common Crawl web graph official PageRank".
The graphs are dumped in BVGraph format.
A quick exploration of the graph can be seen at: github.com/cirosantilli/cirosantilli.github.io/issues/198
Their source code is at: github.com/commoncrawl/cc-webgraph

 Tagged (1)

  • Common Crawl web graph official PageRank

 Ancestors (10)

  1. Common Crawl
  2. Open web crawling
  3. Web crawling
  4. Search engine
  5. Software
  6. Computer
  7. Information technology
  8. Area of technology
  9. Technology
  10.  Home

 Incoming links (3)

  • BVGraph
  • Common Crawl web graph official PageRank
  • Updates / Quick fun with the Common Crawl web graph

 View article source

 Discussion (0)

+ New discussion

There are no discussions about this article yet.

 Articles by others on the same topic (0)

There are currently no matching articles.
  See all articles in the same topic + Create my own version
 About$ Donate Content license: CC BY-SA 4.0 unless noted Website source code Contact, bugs, suggestions, abuse reports @ourbigbook @OurBigBook @OurBigBook