New articles - OurBigBook.com

Ciro Santilli 40 Updated 2025-07-16

Per-table dumps created with mysqldump and listed at: dumps.wikimedia.org/. Most notably, for the English Wikipedia: dumps.wikimedia.org/enwiki/latest/

A few of the files are not actual tables but derived data, notably dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz from Download titles of all Wikipedia articles

The tables are "documented" under: www.mediawiki.org/wiki/Manual:Database_layout, e.g. the central "page" table: www.mediawiki.org/wiki/Manual:Page_table. But in many cases it is impossible to deduce what fields are from those docs.

 Read the full article

enwiki-latest-category.sql by

Ciro Santilli 40 Updated 2025-07-16

 View more

dumps.wikimedia.org/enwiki/latest/enwiki-latest-category.sql.gz contains a list of categories. It only contains the categories and some counts, but it doesn't contain the subcategories and pages under each category, so it is a bit pointless.

The schema is listed at: www.mediawiki.org/wiki/Manual:Category_table

The SQL first defines the table:

CREATE TABLE `category` (
  `cat_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `cat_title` varbinary(255) NOT NULL DEFAULT '',
  `cat_pages` int(11) NOT NULL DEFAULT 0,
  `cat_subcats` int(11) NOT NULL DEFAULT 0,
  `cat_files` int(11) NOT NULL DEFAULT 0,
  PRIMARY KEY (`cat_id`),
  UNIQUE KEY `cat_title` (`cat_title`),
  KEY `cat_pages` (`cat_pages`)
) ENGINE=InnoDB AUTO_INCREMENT=249228235 DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED;

followed by a few humongous inserts:

INSERT INTO `category` VALUES (2,'Unprintworthy_redirects',1597224,20,0),(3,'Computer_storage_devices',88,11,0)

which we can see at: en.wikipedia.org/wiki/Category:Computer_storage_devices

Se see that en.wikipedia.org/wiki/Category:Computer_storage_devices_by_company

en.wikipedia.org/wiki/Category:Computer_storage_devices is a subcategory of that category and it appears in that file.
en.wikipedia.org/wiki/Acronis_Secure_Zone is a page of the category, and it does not appear

so it contains only categories.

We can check this with:

sed -s 's/),/\n/g' enwiki-latest-category.sql | grep Computer_storage_devices

and it shows:

(3,'Computer_storage_devices',88,11,0
(521773,'Computer_storage_devices_by_company',6,6,0

There doesn't seem to be any interlink between the categories, only page and subcategory counts therefore.

 Read the full article

enwiki-latest-categorylinks.sql by

Ciro Santilli 40 Updated 2025-07-16

 View more

dumps.wikimedia.org/enwiki/latest/enwiki-latest-categorylinks.sql.gz

The schema is listed at: www.mediawiki.org/wiki/Manual:Categorylinks_table

On the SQL:

CREATE TABLE `categorylinks` (
  `cl_from` int(8) unsigned NOT NULL DEFAULT 0,
  `cl_to` varbinary(255) NOT NULL DEFAULT '',
  `cl_sortkey` varbinary(230) NOT NULL DEFAULT '',
  `cl_timestamp` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  `cl_sortkey_prefix` varbinary(255) NOT NULL DEFAULT '',
  `cl_collation` varbinary(32) NOT NULL DEFAULT '',
  `cl_type` enum('page','subcat','file') NOT NULL DEFAULT 'page',
  PRIMARY KEY (`cl_from`,`cl_to`),
  KEY `cl_timestamp` (`cl_to`,`cl_timestamp`),
  KEY `cl_sortkey` (`cl_to`,`cl_type`,`cl_sortkey`,`cl_from`),
  KEY `cl_collation_ext` (`cl_collation`,`cl_to`,`cl_type`,`cl_from`)
) ENGINE=InnoDB DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED;

TODO what is cl_from? We've tried:

page_id: nope, there is not page_id of 3

cl_to appears to always be a category string name.

The format appears to be described at: www.mediawiki.org/wiki/Manual:Categorylinks_table

A sample INSERT entry is:

(3,'Computer_storage_devices',88,11,0)

 Read the full article

Wikipedia HOWTO by

Ciro Santilli 40 Updated 2025-07-16

 Read the full article

Download titles of all Wikipedia articles without redirects by

Ciro Santilli 40 Updated 2025-07-16

 View more

 Read the full article

Download all Wikipedia categories by

Ciro Santilli 40 Updated 2025-07-16

 View more

Our WIP script: wikipedia/import-categories.sh.

Consider:

Jewish_physicists

Let's observe them in MySQL:

mysql enwiki -e "select page_id, page_namespace, page_title, page_is_redirect from page where page_namespace in (0, 14) and page_title in ('Computer_storage_devices', 'Computer_data_storage')"

outputs:

+----------+----------------+--------------------------+------------------+
| page_id  | page_namespace | page_title               | page_is_redirect |
+----------+----------------+--------------------------+------------------+
|     5300 |              0 | Computer_data_storage    |                0 |
| 42371130 |              0 | Computer_storage_devices |                1 |
|   711721 |             14 | Computer_data_storage    |                0 |
|   895945 |             14 | Computer_storage_devices |                0 |
+----------+----------------+--------------------------+------------------+

mysql enwiki -e "select cl_from, cl_to from categorylinks where cl_from in (5300, 711721, 895945, 42371130)"

gives:

+----------+-----------------------------------------------------------------------+
| cl_from  | cl_to                                                                 |
+----------+-----------------------------------------------------------------------+
|     5300 | All_articles_containing_potentially_dated_statements                  |
|     5300 | Articles_containing_potentially_dated_statements_from_2009            |
|     5300 | Articles_containing_potentially_dated_statements_from_2011            |
|     5300 | Articles_with_GND_identifiers                                         |
|     5300 | Articles_with_NKC_identifiers                                         |
|     5300 | Articles_with_short_description                                       |
|     5300 | Computer_architecture                                                 |
|     5300 | Computer_data_storage                                                 |
|     5300 | Short_description_matches_Wikidata                                    |
|     5300 | Use_dmy_dates_from_June_2020                                          |
|     5300 | Wikipedia_articles_incorporating_text_from_the_Federal_Standard_1037C |
|   711721 | Computer_architecture                                                 |
|   711721 | Computer_data                                                         |
|   711721 | Computer_hardware_by_type                                             |
|   711721 | Data_storage                                                          |
|   895945 | Computer_data_storage                                                 |
|   895945 | Computer_peripherals                                                  |
|   895945 | Recording_devices                                                     |
| 42371130 | Redirects_from_alternative_names                                      |
+----------+-----------------------------------------------------------------------+

So we see that cl_from encodes the parent categories:

parent categories of categories:
- en.wikipedia.org/wiki/Category:Computer_data_storage, which has ID 711721, has parent categories: "Computer hardware by type", "Computer data", "Data storage", "Computer architecture". This matches exactly on the database. These are all encoded on the source code of the page:
  {{DEFAULTSORT:Storage}} [[Category:Computer hardware by type]] [[Category:Computer data|Storage]] [[Category:Data storage|Computer]] [[Category:Computer architecture]]
- en.wikipedia.org/wiki/Category:Computer_storage_devices has parent categories: "Computer data storage", "Recording devices", "Computer peripherals". This matches exactly on the database.
parent categories of pages:
- en.wikipedia.org/wiki/Computer_storage_devices whish is a redirect gets the magic category "Redirects_from_alternative_names", a humongous placeholder with many thousands of pages: en.wikipedia.org/wiki/Category:Redirects_from_alternative_names
- en.wikipedia.org/wiki/Computer_data_storage shows only two categories onthe web UI: "Computer data storage" and "Computer architecture". Both of these are present on the database and at the end of the source code:
  {{DEFAULTSORT:Computer Data Storage}} [[Category:Computer data storage| ]] [[Category:Computer architecture]]
  The others appear to be more magic. Two of them we can guess from the templates:
  {{short description|Storage of digital data readable by computers}} {{Use dmy dates|date=June 2020}}
  are likely Use_dmy_dates_from_June_2020 and Articles_with_short_description but the rest is more magic and not necessarily present in-source.

So to find all articls and categories under a given category title, say en.wikipedia.org/wiki/Category:Mathematics we can run:

mariadb enwiki -e "select cl_from, cl_to, page_namespace, page_title from categorylinks inner join page on page_namespace in (0, 14) and cl_from = page_id and cl_to = 'Mathematics'"

 Read the full article

How to use a single source multiple times in a Wikipedia article? by

Ciro Santilli 40 Updated 2025-07-16

 View more

www.quora.com/On-Wikipedia-how-can-you-cite-the-same-source-more-than-once-without-them-becoming-separate-references

en.wikipedia.org/wiki/Help:Footnotes#Footnotes:_using_a_source_more_than_once gives the following method:

Definition, anywhere on article, likely ideally as the first usage:

<ref name="myname">{{cite web ...}}</ref>

And then you can use it later on as:

<ref name="myname" />

which automatically expands the exact same thing, or using the shortcut:

{{r|myname}}

To cite multiple pages of a book: en.wikipedia.org/wiki/Wikipedia:Citing_sources#Citing_multiple_pages_of_the_same_source, the best method is to define and use the reference without adding the p or location in cite as:

<ref name="googleStory">{{cite book |title=The Google Story}}</ref>{{rp|p=123}}

Do not set the page in cite, otherwise it shows up on the references. Instead we use the {{rp}} template. And then use the reference with the {{r}} template as:

{{r|googleStory|p=456}}

or for multiple pages:

{{r|googleStory|pp=123, 156-158}}

 Read the full article

How to cite a book on Wikipedia by

Ciro Santilli 40 Updated 2025-07-16

 View more

To avoid duplication when citing multiple pages: Section "How to use a single source multiple times in a Wikipedia article?"

A good big sample definition:

<ref name="googleStory">{{cite book |last1=Vise |first1=David |author-link1=David A. Vise |last2=Malseed |first2=Mark |author-link2=Mark Malseed |title=The Google Story |date=2008 |publisher=Delacorte Press |url=https://archive.org/details/isbn_9780385342728}}</ref>

There is also title-link to link to a wiki page. But it is incompatible with url= for Internet Archive Open Library links which is a shame.

 Read the full article

Functional equation by

Ciro Santilli 40 Updated 2025-07-16

 Read the full article

Wikipedia edit request by

Ciro Santilli 40 Updated 2025-07-16

 View more

en.wikipedia.org/wiki/Wikipedia:Edit_requests

So, it turns out that Wikipedia does have a (ultra obscure as usual) mechanism for pull requests. You learn a new one every day.

 Read the full article

Wikipedia subpages by

Ciro Santilli 40 Updated 2025-07-16

 View more

en.wikipedia.org/wiki/Wikipedia:User_pages

OMG they have that. Slightly slightly overlap with OurBigBook.com.

 Read the full article

History of Wikipedia by

Ciro Santilli 40 Updated 2025-07-16

 View more

A 2022 clone of phabricator.wikimedia.org/source/mediawiki.git gives first commits from 2003 by:

Lee Daniel Crocker: en.wikipedia.org/wiki/Lee_Daniel_Crocker
He is best known for rewriting the software upon which Wikipedia runs, to address scalability problems.
so that gives a good notion of the last major rewrite.
Brion Vibber

TODO when was wikipedia open sourced from Nupedia? The early days of Wikipedia are quite obscure due to its transition from Nupedia.

 Read the full article

Nupedia by

Ciro Santilli 40 Updated 2025-07-16

 Read the full article

Wikipedia analytics by

Ciro Santilli 40 Updated 2025-07-16

 Read the full article

Pageviews Analysis by

Ciro Santilli 40 Updated 2025-07-16

 View more

Cool tool that allows you to graphically visualize page view counts of specific pages. It offers somewhat similar insights to Google Trends.

Homepage: pageviews.wmcloud.org/

Documentation: meta.wikimedia.org/wiki/Pageviews_Analysis#Massviews

The homepage shows views of selected pages, e.g. when Google had their 25th birthday: pageviews.wmcloud.org/?project=en.wikipedia.org&platform=all-access&agent=user&redirects=0&start=2023-09-11&end=2023-10-01&pages=Cat|Dog|Larry_Page Larry Page briefly beat "Cat" and "Dog".

/topviews shows the most viewed pages for a given month: pageviews.wmcloud.org/topviews/?project=en.wikipedia.org&platform=all-access&date=2023-08&excludes= It is extremelly epic that XXX: Return of Xander Cage, a 2017 film, is on the top ten of the August 2023 month. The page was around 8th place on a Google search for "xxx": archive.ph/wip/giRY8 at the time. XXXX (beer) was also on the top 20, followed by Sex on 21.

 Read the full article

Wikimedia Foundation by

Ciro Santilli 40 Updated 2025-07-16

 Read the full article

Wikimedia Foundation project by

Ciro Santilli 40 Updated 2025-07-16

 Read the full article

It is not possible to teach natural sciences on Wikipedia by

Ciro Santilli 40 Updated 2025-07-16

 View more

Because of edit wars and encyclopedic tone requirements. See also: Wikipedia.

Thus OurBigBook.com.

 Read the full article

Wikipedia person by

Ciro Santilli 40 Updated 2025-07-16

 Read the full article

Jimmy Wales by

Ciro Santilli 40 Updated 2025-07-16

 View more

One thing to note is that Jimmy was a finance worker before starting wikipdia, e.g. he had capital to hire Larry Sanger.

Maybe that's the way to go about it, make money first, and later on change the world.

Starting just after the beginning of the Internet can't hurt either. Though tooling must have been insane back then.

 Read the full article

 Pinned article: Introduction to the OurBigBook Project

Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.

Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!

Video 1.

Intro to OurBigBook

. Source.

We have two killer features:

topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculus
Articles of different users are sorted by upvote within each article page. This feature is a bit like:
- a Wikipedia where each user can have their own version of each article
- a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.
Figure 1.
Screenshot of the "Derivative" topic page
. View it live at: ourbigbook.com/go/topic/derivative
Video 2.
OurBigBook Web topics demo
. Source.
local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:
- to OurBigBook.com to get awesome multi-user features like topics and likes
- as HTML files to a static website, which you can host yourself for free on many external providers like GitHub Pages, and remain in full control
This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
Figure 2.
You can publish local OurBigBook lightweight markup files to either https://OurBigBook.com or as a static website
.
Figure 3.
Visual Studio Code extension installation
.
Figure 4.
Visual Studio Code extension tree navigation
.
Figure 5.
Web editor
. You can also edit articles on the Web editor without installing anything locally.
Video 3.
Edit locally and publish demo
. Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension.
Video 4.
OurBigBook Visual Studio Code extension editing and navigation demo
. Source.
Internal cross file references done right:
Infinitely deep tables of contents:
Figure 6.
Dynamic article tree with infinitely deep table of contents
.
Live URL: ourbigbook.com/cirosantilli/chordate
Descendant pages can also show up as toplevel e.g.: ourbigbook.com/cirosantilli/chordate-subclade