Wikipedia (2001)

Why Wikipedia sucks: Section "Wikipedia".

Best languages:

latin
esperanto. Other constructed languages: en.wikipedia.org/wiki/Wikipedia:List_of_constructed_languages_with_Wikipedias

The most important page of Wikipedia is undoubtedly: en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Perennial_sources which lists the accepted and non accepted sources. Basically, the decision of what is true in this world.

Wikipedia is incredibly picky about copyright. E.g.: en.wikipedia.org/wiki/Wikipedia:Deletion_of_all_fair_use_images_of_living_people because "such portrait could be created". Yes, with a time machine, no problem! This does more harm than good... excessive!

Citing in Wikipedia is painful. Partly because of they have a billion different templates that you have to navigate. They should really have a system where you can easily reuse existing sources across articles! Section "How to use a single source multiple times in a Wikipedia article?"

Video 1. What Happened To Wikipedia's Founders? Source.

youtu.be/_Rt0eAPLDkM?t=113 encyclopedia correction stickers. OMG!
youtu.be/_Rt0eAPLDkM?t=201 Jimmy was a moderator on MUD games

Video 2. Inside the Wikimedia Foundation offices by Wikimedia Foundation (2008) Source.

Table of contents

Deletionism on Wikipedia

en.wikipedia.org/wiki/Deletionism_and_inclusionism_in_Wikipedia

Some exmaples by Ciro Santilli follow.

Of the tutorial-subjectivity type:

This edit perfectly summarizes how Ciro feels about Wikipedia (no particular hate towards that user, he was a teacher at the prestigious Pierre and Marie Curie University and actually as a wiki page about him):
rm a cryptic diagram (not understandable by a professional mathematician, without further explanations
which removed the only diagram that was actually understandable to non-Mathematicians, which Ciro Santilli had created, and received many upvotes at: math.stackexchange.com/questions/776039/intuition-behind-normal-subgroups/3732426#3732426. The removal does not generate any notifications to you unless you follow the page which would lead to infinite noise, and is extremely difficult to find out how to contact the other person. The removal justification is even somewhat ad hominem: how does he know Ciro Santilli is also not a professional Mathematician? :-) Maybe it is obvious because Ciro explains in a way that is understandable. Also removal makes no effort to contact original author. Of course, this is caused by the fact that there must also have been a bunch of useless edits not done by Ciro, and there is no reputation system to see if you should ignore a person or not immediately, so removal author has no patience anymore. This is what makes it impossible to contribute to Wikipedia: your stuff gets deleted at any time, and you don't know how to appeal it. Ciro is going to regret having written this rant after Daniel replies and shows the diagram is crap. But that would be better than not getting a reply and not learning that the diagram is crap.
en.wikipedia.org/w/index.php?title=Finite_field&type=revision&diff=1044934168&oldid=1044905041 on finite fields with edit comment "Obviously: X ≡ α". Discussion at en.wikipedia.org/wiki/Talk:Finite_field#Concrete_simple_worked_out_example Some people simply don't know how to explain things to beginners, or don't think Wikipedia is where it should be done. One simply can't waste time fighting off those people, writing good tutorials is hard enough in itself without that fight.

Notability constraints, which are are way too strict:

even information about important companies can be disputed. E.g. once Ciro Santilli tried to create a page for PsiQuantum, a startup with $650m in funding, and there was a deletion proposal because it did not contain verifiable sources not linked directly to information provided by the company itself: en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/PsiQuantum Although this argument is correct, it is also true about 90% of everything that is on Wikipedia about any company. Where else can you get any information about a B2B company? Their clients are not going to say anything. Lawsuits and scandals are kind of the only possible source... In that case, the page was deleted with 2 votes against vs 3 votes for deletion.
should we delete this extremely likely useful/correct content or not according to this extremely complex system of guidelines"
is very similar to Stack Exchange's own Stack Overflow content deletion issues. Ain't Nobody Got Time For That. "Ain't Nobody Got Time for That" actually has a Wiki page: en.wikipedia.org/wiki/Ain%27t_Nobody_Got_Time_for_That. That's notable. Unlike a $600M+ company of course.

There are even a Wikis that were created to remove notability constraints: Wiki without notability requirements.

For these reasons reason why Ciro basically only contributes images to Wikipedia: because they are either all in or all out, and you can determine which one of them it is. And this allows images to be more attributable, so people can actually see that it was Ciro that created a given amazing image, thus overcoming Wikipedia's lack of reputation system a little bit as well.

Wikipedia is perfect for things like biographies, geography, or history, which have a much more defined and subjective expository order. But when it comes to "tutorials of how to actually do stuff", which is what mathematics and physics are basically about, Wikipedia has a very hard time to go beyond dry definitions which are only useful for people who already half know the stuff. But to learn from zero, newbies need tutorials with intuition and examples.

Bibiography:

gwern.net/inclusionism from gwern.net:
Iron Law of Bureaucracy: the downwards deletionism spiral discourages contribution and is how Wikipedia will die.

Wikipedia dumps

Per-table dumps created with mysqldump and listed at: dumps.wikimedia.org/. Most notably, for the English Wikipedia: dumps.wikimedia.org/enwiki/latest/

A few of the files are not actual tables but derived data, notably dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz from Download titles of all Wikipedia articles

The tables are "documented" under: www.mediawiki.org/wiki/Manual:Database_layout, e.g. the central "page" table: www.mediawiki.org/wiki/Manual:Page_table. But in many cases it is impossible to deduce what fields are from those docs.

enwiki-latest-category.sql

dumps.wikimedia.org/enwiki/latest/enwiki-latest-category.sql.gz contains a list of categories. It only contains the categories and some counts, but it doesn't contain the subcategories and pages under each category, so it is a bit pointless.

The schema is listed at: www.mediawiki.org/wiki/Manual:Category_table

The SQL first defines the table:

CREATE TABLE `category` (
  `cat_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `cat_title` varbinary(255) NOT NULL DEFAULT '',
  `cat_pages` int(11) NOT NULL DEFAULT 0,
  `cat_subcats` int(11) NOT NULL DEFAULT 0,
  `cat_files` int(11) NOT NULL DEFAULT 0,
  PRIMARY KEY (`cat_id`),
  UNIQUE KEY `cat_title` (`cat_title`),
  KEY `cat_pages` (`cat_pages`)
) ENGINE=InnoDB AUTO_INCREMENT=249228235 DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED;

followed by a few humongous inserts:

INSERT INTO `category` VALUES (2,'Unprintworthy_redirects',1597224,20,0),(3,'Computer_storage_devices',88,11,0)

which we can see at: en.wikipedia.org/wiki/Category:Computer_storage_devices

Se see that en.wikipedia.org/wiki/Category:Computer_storage_devices_by_company

en.wikipedia.org/wiki/Category:Computer_storage_devices is a subcategory of that category and it appears in that file.
en.wikipedia.org/wiki/Acronis_Secure_Zone is a page of the category, and it does not appear

so it contains only categories.

We can check this with:

sed -s 's/),/\n/g' enwiki-latest-category.sql | grep Computer_storage_devices

and it shows:

(3,'Computer_storage_devices',88,11,0
(521773,'Computer_storage_devices_by_company',6,6,0

There doesn't seem to be any interlink between the categories, only page and subcategory counts therefore.

enwiki-latest-categorylinks.sql

dumps.wikimedia.org/enwiki/latest/enwiki-latest-categorylinks.sql.gz

The schema is listed at: www.mediawiki.org/wiki/Manual:Categorylinks_table

On the SQL:

CREATE TABLE `categorylinks` (
  `cl_from` int(8) unsigned NOT NULL DEFAULT 0,
  `cl_to` varbinary(255) NOT NULL DEFAULT '',
  `cl_sortkey` varbinary(230) NOT NULL DEFAULT '',
  `cl_timestamp` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  `cl_sortkey_prefix` varbinary(255) NOT NULL DEFAULT '',
  `cl_collation` varbinary(32) NOT NULL DEFAULT '',
  `cl_type` enum('page','subcat','file') NOT NULL DEFAULT 'page',
  PRIMARY KEY (`cl_from`,`cl_to`),
  KEY `cl_timestamp` (`cl_to`,`cl_timestamp`),
  KEY `cl_sortkey` (`cl_to`,`cl_type`,`cl_sortkey`,`cl_from`),
  KEY `cl_collation_ext` (`cl_collation`,`cl_to`,`cl_type`,`cl_from`)
) ENGINE=InnoDB DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED;

TODO what is cl_from? We've tried:

page_id: nope, there is not page_id of 3

cl_to appears to always be a category string name.

The format appears to be described at: www.mediawiki.org/wiki/Manual:Categorylinks_table

A sample INSERT entry is:

(3,'Computer_storage_devices',88,11,0)

Wikipedia HOWTO

Download titles of all Wikipedia articles

stackoverflow.com/questions/24474288/how-to-obtain-a-list-of-titles-of-all-wikipedia-articles

dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz Characterization:

contains redirects, e.g. en.wikipedia.org/wiki/"Ampere_North" redirects to en.wikipedia.org/wiki/Ampere_North,_New_Jersey and both are present. Noted in this comment: stackoverflow.com/questions/24474288/how-to-obtain-a-list-of-titles-of-all-wikipedia-articles#comment136016773_24474476

Download titles of all Wikipedia articles without redirects

Download all Wikipedia categories

Our WIP script: wikipedia/import-categories.sh.

Consider:

Jewish_physicists

Let's observe them in MySQL:

mysql enwiki -e "select page_id, page_namespace, page_title, page_is_redirect from page where page_namespace in (0, 14) and page_title in ('Computer_storage_devices', 'Computer_data_storage')"

outputs:

+----------+----------------+--------------------------+------------------+
| page_id  | page_namespace | page_title               | page_is_redirect |
+----------+----------------+--------------------------+------------------+
|     5300 |              0 | Computer_data_storage    |                0 |
| 42371130 |              0 | Computer_storage_devices |                1 |
|   711721 |             14 | Computer_data_storage    |                0 |
|   895945 |             14 | Computer_storage_devices |                0 |
+----------+----------------+--------------------------+------------------+

mysql enwiki -e "select cl_from, cl_to from categorylinks where cl_from in (5300, 711721, 895945, 42371130)"

gives:

+----------+-----------------------------------------------------------------------+
| cl_from  | cl_to                                                                 |
+----------+-----------------------------------------------------------------------+
|     5300 | All_articles_containing_potentially_dated_statements                  |
|     5300 | Articles_containing_potentially_dated_statements_from_2009            |
|     5300 | Articles_containing_potentially_dated_statements_from_2011            |
|     5300 | Articles_with_GND_identifiers                                         |
|     5300 | Articles_with_NKC_identifiers                                         |
|     5300 | Articles_with_short_description                                       |
|     5300 | Computer_architecture                                                 |
|     5300 | Computer_data_storage                                                 |
|     5300 | Short_description_matches_Wikidata                                    |
|     5300 | Use_dmy_dates_from_June_2020                                          |
|     5300 | Wikipedia_articles_incorporating_text_from_the_Federal_Standard_1037C |
|   711721 | Computer_architecture                                                 |
|   711721 | Computer_data                                                         |
|   711721 | Computer_hardware_by_type                                             |
|   711721 | Data_storage                                                          |
|   895945 | Computer_data_storage                                                 |
|   895945 | Computer_peripherals                                                  |
|   895945 | Recording_devices                                                     |
| 42371130 | Redirects_from_alternative_names                                      |
+----------+-----------------------------------------------------------------------+

So we see that cl_from encodes the parent categories:

parent categories of categories:
- en.wikipedia.org/wiki/Category:Computer_data_storage, which has ID 711721, has parent categories: "Computer hardware by type", "Computer data", "Data storage", "Computer architecture". This matches exactly on the database. These are all encoded on the source code of the page:
```
{{DEFAULTSORT:Storage}}
[[Category:Computer hardware by type]]
[[Category:Computer data|Storage]]
[[Category:Data storage|Computer]]
[[Category:Computer architecture]]
```
- en.wikipedia.org/wiki/Category:Computer_storage_devices has parent categories: "Computer data storage", "Recording devices", "Computer peripherals". This matches exactly on the database.
parent categories of pages:
- en.wikipedia.org/wiki/Computer_storage_devices whish is a redirect gets the magic category "Redirects_from_alternative_names", a humongous placeholder with many thousands of pages: en.wikipedia.org/wiki/Category:Redirects_from_alternative_names
- en.wikipedia.org/wiki/Computer_data_storage shows only two categories onthe web UI: "Computer data storage" and "Computer architecture". Both of these are present on the database and at the end of the source code:
```
{{DEFAULTSORT:Computer Data Storage}}
[[Category:Computer data storage| ]]
[[Category:Computer architecture]]
```
  The others appear to be more magic. Two of them we can guess from the templates:
```
{{short description|Storage of digital data readable by computers}}
{{Use dmy dates|date=June 2020}}
```
  are likely Use_dmy_dates_from_June_2020 and Articles_with_short_description but the rest is more magic and not necessarily present in-source.

So to find all articls and categories under a given category title, say en.wikipedia.org/wiki/Category:Mathematics we can run:

mariadb enwiki -e "select cl_from, cl_to, page_namespace, page_title from categorylinks inner join page on page_namespace in (0, 14) and cl_from = page_id and cl_to = 'Mathematics'"

How to use a single source multiple times in a Wikipedia article?

www.quora.com/On-Wikipedia-how-can-you-cite-the-same-source-more-than-once-without-them-becoming-separate-references

en.wikipedia.org/wiki/Help:Footnotes#Footnotes:_using_a_source_more_than_once gives the following method:

Definition, anywhere on article, likely ideally as the first usage:

<ref name="myname">{{cite web ...}}</ref>

And then you can use it later on as:

<ref name="myname" />

which automatically expands the exact same thing, or using the shortcut:

{{r|myname}}

To cite multiple pages of a book: en.wikipedia.org/wiki/Wikipedia:Citing_sources#Citing_multiple_pages_of_the_same_source, the best method is to define and use the reference without adding the p or location in cite as:

<ref name="googleStory">{{cite book |title=The Google Story}}</ref>{{rp|p=123}}

Do not set the page in cite, otherwise it shows up on the references. Instead we use the {{rp}} template. And then use the reference with the {{r}} template as:

{{r|googleStory|p=456}}

or for multiple pages:

{{r|googleStory|pp=123, 156-158}}

How to cite a book on Wikipedia

To avoid duplication when citing multiple pages: Section "How to use a single source multiple times in a Wikipedia article?"

A good big sample definition:

<ref name="googleStory">{{cite book |last1=Vise |first1=David |author-link1=David A. Vise |last2=Malseed |first2=Mark |author-link2=Mark Malseed |title=The Google Story |date=2008 |publisher=Delacorte Press |url=https://archive.org/details/isbn_9780385342728}}</ref>

There is also title-link to link to a wiki page. But it is incompatible with url= for Internet Archive Open Library links which is a shame.

Wikipedia edit request

en.wikipedia.org/wiki/Wikipedia:Edit_requests

So, it turns out that Wikipedia does have a (ultra obscure as usual) mechanism for pull requests. You learn a new one every day.

Wikipedia subpages

en.wikipedia.org/wiki/Wikipedia:User_pages

OMG they have that. Slightly slightly overlap with OurBigBook.com.

History of Wikipedia

A 2022 clone of phabricator.wikimedia.org/source/mediawiki.git gives first commits from 2003 by:

Lee Daniel Crocker: en.wikipedia.org/wiki/Lee_Daniel_Crocker
He is best known for rewriting the software upon which Wikipedia runs, to address scalability problems.
so that gives a good notion of the last major rewrite.
Brion Vibber

TODO when was wikipedia open sourced from Nupedia? The ealry days of Wikipedia are quite obscure due to its transition from Nupedia.

Nupedia

Wikipedia analytics (How to view how many visits a Wikipedia page has?)

Pageviews Analysis

Cool tool that allows you to graphically visualize page viewc counts of specific pages. It offers somewhat similar insights to Google Trends.

Homepage: pageviews.wmcloud.org/

Documentation: meta.wikimedia.org/wiki/Pageviews_Analysis#Massviews

The homepage shows views of selected pages, e.g. when Google had their 25th birthday: pageviews.wmcloud.org/?project=en.wikipedia.org&platform=all-access&agent=user&redirects=0&start=2023-09-11&end=2023-10-01&pages=Cat|Dog|Larry_Page Larry Page briefly beat "Cat" and "Dog".

/topviews shows the most viewed pages for a given month: pageviews.wmcloud.org/topviews/?project=en.wikipedia.org&platform=all-access&date=2023-08&excludes= It is extremelly epic that XXX: Return of Xander Cage, a 2017 film, is on the top ten of the August 2023 month. The page was around 8th place on a Google search for "xxx": archive.ph/wip/giRY8 at the time. XXXX (beer) was also on the top 20, followed by Sex on 21.

Wikimedia Foundation

Wikimedia Foundation project

Wikidata

It is not possible to teach natural sciences on Wikipedia

Because of edit wars and encyclopedic tone requirements. See also: Wikipedia.

Thus OurBigBook.com.

Wikipedia person

Jimmy Wales

One thing to note is that Jimmy was a finance worker before starting wikipdia, e.g. he had capital to hire Larry Sanger.

Maybe that's the way to go about it, make money first, and later on change the world.

Starting just after the beginning of the Internet can't hurt either. Though tooling must have been insane back then.

Steven Pruitt

Video 1. Meet the man behind a third of what's on Wikipedia. Source.

MediaWiki

Open source software engine created for and used by Wikipedia.

MediaWiki instance

en.wikialpha.org/wiki/Main_Page

MediaWiki markup

www.mediawiki.org/wiki/Markup_spec

How to reference a book in Wikipedia markup?

Their reference markup is incredibly overengineered, convoluted, and underdocumented, it is unbelivable!

Use the reference:

This is a fact.{{sfn|Schweber|1994|p=487}}

Define the reference:

===Sources===
{{refbegin|2|indent=yes}}
*{{Cite book|author-link=Silvan S. Schweber |title=QED and the Men Who Made It: Dyson, Feynman, Schwinger, and Tomonaga|last=Schweber|first=Silvan S.|location=Princeton|publisher=University Press|year=1994 |isbn=978-0-691-03327-3 |url=https://archive.org/details/qedmenwhomadeitd0000schw/page/492 |url-access=registration}}
{{refend}}

sfn is magic and matches the the author last name and date from the Cite, it is documented at: en.wikipedia.org/wiki/Template:Sfn

Unforutunately, if there are multiple duplicate Cites inline in the article, it will complain that there are multiple definitions, and you have to first factor out the article by replacing all those existing Cite with sfn, and keeping just one Cite at the bottom. What a pain...

You can also link to a specific page of the book, e.g. if it is a book is on Internet Archive Open Library with:

{{sfn|Murray|1997|p=[https://archive.org/details/supermenstory00murr/page/86 86]}}

For multiple pages should use pp= instead of p=. Does not seem to make much difference on the rendered output besides showing p. vs pp., but so be it:

{{sfn|Murray|1997|pp=[https://archive.org/details/supermenstory00murr/page/86 86-87]}}

Ciro Santilli's Wikipedia contributions

Let's see how long they last:

Julian Schwinger: en.wikipedia.org/w/index.php?title=Julian_Schwinger&oldid=1039812272 greatly expanded the Early life and career with information from the book QED and the men who made it: Dyson, Feynman, Schwinger, and Tomonaga by Silvan Schweber (1994)