As of 2025 Common Crawl web graph also dumps its own PageRank for each release. See e.g. the file so quite plausible, except for
cc-main-2024-25-dec-jan-feb-host-ranks.txt.gz from at: data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2024-25-dec-jan-feb/index.html The first 20 rows are:#harmonicc_pos #harmonicc_val #pr_pos #pr_val #host_rev
1 3.4626736E7 3 0.005384977821460953 com.facebook
2 3.42356E7 2 0.007010813553170503 com.googleapis.fonts
3 3.007577E7 1 0.008634952900502719 com.google
4 3.0036014E7 4 0.004411782034463272 com.googletagmanager
5 2.9900088E7 5 0.0036940035989790525 com.youtube
6 2.9537252E7 6 0.0032959808223701 com.instagram
7 2.9092556E7 9 0.0027616338842143423 com.twitter
8 2.7346152E7 7 0.0032101332824200743 com.gstatic.fonts
9 2.6818654E7 11 0.0017699438634060259 com.linkedin
10 2.5383126E7 8 0.0027849243241515574 org.gmpg
11 2.3747762E7 12 0.0016577826631867043 com.google.maps
12 2.3514198E7 15 0.0013399414238881337 com.googleapis.ajax
13 2.3504832E7 16 0.0012791339750445332 com.google.play
14 2.337092E7 47 3.794876113587071E-4 be.youtu
15 2.2925148E7 14 0.0013857916784687163 com.cloudflare.cdnjs
16 2.2851038E7 18 0.0012066313543285154 com.google.plus
17 2.2833728E7 13 0.0015745738381307273 org.wordpress
18 2.2830926E7 36 6.02400471665468E-4 com.pinterest
19 2.27056E7 45 4.001342924757244E-4 com.google.support
20 2.2687704E7 24 9.381217848819624E-4 net.jsdelivr.cdnorg.gmpg. What the fuck is that and why is it ranked so high? Is it a quirk with the hosts inside subdomains?Perhaps a more relevant dump might be the domain-only one But nope,
cc-main-2024-25-dec-jan-feb-domain-ranks.txt.gz:#harmonicc_pos #harmonicc_val #pr_pos #pr_val #host_rev #n_hosts
1 3.1238044E7 3 0.01110707704411023 com.facebook 3632
2 3.0950192E7 2 0.016650558868491434 com.googleapis 3470
3 3.000803E7 1 0.01749148008448444 com.google 14053
4 2.7319046E7 5 0.00670112168785935 com.instagram 789
5 2.7020862E7 7 0.005464885844102939 com.youtube 1628
6 2.6954494E7 4 0.007740808154448889 com.googletagmanager 42
7 2.6344278E7 8 0.0052073382920908295 com.twitter 712
8 2.5414934E7 6 0.0058790483755603844 com.gstatic 171
9 2.4803688E7 11 0.0038589161241338816 com.linkedin 690
10 2.4683842E7 10 0.004929923081722034 org.gmpg 2
11 2.3575146E7 9 0.005111453489231459 com.cloudflare 951
12 2.2735678E7 14 0.002131882799792225 com.gravatar 98
13 2.2356142E7 12 0.002513741654851857 org.wordpress 1250
14 2.2132868E7 15 0.0019991529719988496 com.apple 3261
15 2.2095914E7 31 0.0010706467268355303 org.wikipedia 2099
16 2.2057972E7 21 0.0015644264715267535 com.pinterest 360
17 2.1941062E7 40 8.52391305373285E-4 be.youtu 15
18 2.1826452E7 16 0.0018442726685905964 net.jsdelivr 40
19 2.1764224E7 34 9.747994384099485E-4 gl.goo 951
20 2.1690982E7 35 9.740295347556525E-4 com.vimeo org.gmpg is still there!vigna.di.unimi.it/ftp/papers/GraphStructure.pdf comments on it: so it appears to be a computer-readable ontology mechanism in the lines of Resource Description Framework which interlinks many websites. The article also mentions another interesting noise in
miibeian.gov.cn which every Chinese website is required to link to for their ICP license.The downside of "Katz centrality" compared to PageRank appears to be that if if a big node links to many many nodes, all of those earn a lot of reputation, regardless of how outgoing links there are:
This is the family of algorithms to which PageRank
In 2017 apparently they've started making their own Web Graphs, i.e. they parse the HTML and extract the graph of what links to what.
Edit: actually, they already calculate PageRank for us!!! Fantastic!!! Main section: Section "Common Crawl web graph official PageRank".
A quick exploration of the graph can be seen at: github.com/cirosantilli/cirosantilli.github.io/issues/198
Their source code is at: github.com/commoncrawl/cc-webgraph
École Polytechnique students identify their academic year, or "promotion" in French, by start year date.
For example, Ciro Santilli's year started in 2009, though as a foreign student he arrived only at the start of 2010, and Ciro's promotion is usually known just as X09. And as the century barrier is broken we'll start to need to specify as X2009 one day.
List of notable alumni:
A quick hands-on introduction to the software by Ciro Santilli can be found at: github.com/cirosantilli/cirosantilli.github.io/issues/198
Pinned article: Introduction to the OurBigBook Project
Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.
Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!
Intro to OurBigBook
. Source. We have two killer features:
- topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculusArticles of different users are sorted by upvote within each article page. This feature is a bit like:
- a Wikipedia where each user can have their own version of each article
- a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.Figure 1. Screenshot of the "Derivative" topic page. View it live at: ourbigbook.com/go/topic/derivativeVideo 2. OurBigBook Web topics demo. Source. - local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
- to OurBigBook.com to get awesome multi-user features like topics and likes
- as HTML files to a static website, which you can host yourself for free on many external providers like GitHub Pages, and remain in full control
Figure 3. Visual Studio Code extension installation.Figure 4. Visual Studio Code extension tree navigation.Figure 5. Web editor. You can also edit articles on the Web editor without installing anything locally.Video 3. Edit locally and publish demo. Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension.Video 4. OurBigBook Visual Studio Code extension editing and navigation demo. Source. - Infinitely deep tables of contents:
All our software is open source and hosted at: github.com/ourbigbook/ourbigbook
Further documentation can be found at: docs.ourbigbook.com
Feel free to reach our to us for any help or suggestions: docs.ourbigbook.com/#contact








