The steps are sorted in roughly chronological order. The project might fail at any point, and some steps may be carried in parallel:
- make OurBigBook Markup good enough, to the point that it allows to create a static version of the website, which is used to prototype certain ideas, and for Ciro to start writing test content.Status March 2022: reached a point that it is already highly usable. The following website may continue.
- create a basic implementation of the website, without advanced features like PageRank sorting and WYSIWYG. This is not much more than a blog with some extra metadata, so it is definitely achievable with constrained resources.
- find a university teacher would would like to try it out.Ciro would like to volunteer to work for free for this teacher and students to help the students learn.He would like act like a "super student" who has a lot of free time and motivation.Ciro would start by mapping the headers of the lecture notes onto the website, and then slowly adding content as he feels the need to improve certain explanations.Finding teachers willing to allow this will be a major roadblock: how to convince teachers to use CC BY-SA.If such enlightened teacher is found, it will allow for the initial validation of the website, to decide what kind of tweaking the idea might need, and start uploading quality technical content to the site.
- once some level of validation as been done, Ciro will start looking for charitable charitable grant opportunities more aggressively
- if things seem to be working, start adding more advance features: PageRank-like ranking sorting and WYSIWYG editingThe recommendation algorithms notably is left for a second stage because it needs real world data to be tested. And at the beginning, before Eternal September kicks in, there would be few posts written by well educated university students, so a simple sort by upvote would likely be good enough.
Ciro decided to start with a decent markup language with a decent implementation: OurBigBook Markup. Once that gets reasonable, he will move on to another attempt at the website itself.
The project description was originally at: github.com/cirosantilli/write-free-science-books-to-get-famous-website but being migrated here. The original working project name was "Write free books to get famous website", until Ciro decided to settle for
OurBigBook.com
and fixed the domain name.github.com/cirosantilli/cirosantilli.github.io/issues/198. Previously at: stackoverflow.com/questions/31321009/best-more-standard-graph-representation-file-format-graphson-gexf-graphml/79467334#79467334 but Stack Overflow fucking deleted the question.
I wanted to do a quick exploration of open PageRank implementation and data.
My general motivation for this is that a PageRank-like algorithm could be useful for more accurate user and article ranking on OurBigBook, see: Section "PageRank-like ranking"
But it could also be just generally cool to apply it to other graph datasets, e.g. for computing an Wikipedia internal PageRank.
A quick Google reveals only Open PageRank, but their methods are apparently closed source.
Then I had a look at the Common Crawl web graph data to see if I could easily calculate it myself, and... they already have it! See: Section "Common Crawl web graph official PageRank"
Their graph dumps are in BVGraph graph file format, which is the native format of the WebGraph framework, which implements the format and algorithms such as PageRank.
The only thing I miss is a command line interface to calculate the PageRank. That would be so awesome.
The more I look at it the more I love Common Crawl.
Announcements:
In cc-main-2024-25-dec-jan-feb-domain-ranks.txt:
cirosantilli.com
was ranked ~453kourbigbook.com
was at ~606k