The steps are sorted in roughly chronological order. The project might fail at any point, and some steps may be carried in parallel:
- create a basic implementation of the website, without advanced features like PageRank sorting and WYSIWYG. This is not much more than a blog with some extra metadata, so it is definitely achievable with constrained resources.
- Ciro would like to volunteer to work for free for this teacher and students to help the students learn.Ciro would start by mapping the headers of the lecture notes onto the website, and then slowly adding content as he feels the need to improve certain explanations.Finding teachers willing to allow this will be a major roadblock: how to convince teachers to use CC BY-SA.
- once some level of validation as been done, Ciro will start looking for charitable charitable grant opportunities more aggressively
- if things seem to be working, start adding more advance features: PageRank-like ranking sorting and WYSIWYG editingThe recommendation algorithms notably is left for a second stage because it needs real world data to be tested. And at the beginning, before Eternal September kicks in, there would be few posts written by well educated university students, so a simple sort by upvote would likely be good enough.
Ciro decided to start with a decent markup language with a decent implementation: OurBigBook Markup. Once that gets reasonable, he will move on to another attempt at the website itself.
The project description was originally at: github.com/cirosantilli/write-free-science-books-to-get-famous-website but being migrated here. The original working project name was "Write free books to get famous website", until Ciro decided to settle for
OurBigBook.com
and fixed the domain name.github.com/cirosantilli/cirosantilli.github.io/issues/198. Previously at: stackoverflow.com/questions/31321009/best-more-standard-graph-representation-file-format-graphson-gexf-graphml/79467334#79467334 but Stack Overflow fucking deleted the question.
My general motivation for this is that a PageRank-like algorithm could be useful for more accurate user and article ranking on OurBigBook, see: Section "PageRank-like ranking"
But it could also be just generally cool to apply it to other graph datasets, e.g. for computing an Wikipedia internal PageRank.
Then I had a look at the Common Crawl web graph data to see if I could easily calculate it myself, and... they already have it! See: Section "Common Crawl web graph official PageRank"
Their graph dumps are in BVGraph graph file format, which is the native format of the WebGraph framework, which implements the format and algorithms such as PageRank.
The only thing I miss is a command line interface to calculate the PageRank. That would be so awesome.
Announcements:
In cc-main-2024-25-dec-jan-feb-domain-ranks.txt:
cirosantilli.com
was ranked ~453kourbigbook.com
was at ~606k