1914 Nobel Prize in Physics by Ciro Santilli 35 Updated +Created
Not only did this open the way for X-ray crystallography, it more fundamentally clarified the nature of X-rays as being electromagnetic radiation, and helped further establish the atomic theory.
Quantization as an Eigenvalue Problem by Ciro Santilli 35 Updated +Created
This paper appears to calculate the Schrödinger equation solution for the hydrogen atom.
TODO is this the original paper on the Schrödinger equation?
Published on Annalen der Physik in 1926.
Open access in German at: onlinelibrary.wiley.com/doi/10.1002/andp.19263840404 which gives volume 384, Issue 4, Pages 361-376. Kudos to Wiley for that. E.g. Nature did not have similar policies as of 2023.
This paper may have fallen into the public domain in the US in 2022! On the Internet Archive we can see scans of the journal that contains it at: ia903403.us.archive.org/29/items/sim_annalen-der-physik_1926_79_contents/sim_annalen-der-physik_1926_79_contents.pdf. Ciro Santilli extracted just the paper to: commons.wikimedia.org/w/index.php?title=File%3AQuantisierung_als_Eigenwertproblem.pdf. It is not as well processed as the Wiley one, but it is of 100% guaranteed clean public domain provenance! TODO: hmmm, it may be public domain in the USA but not Germany, where 70 years after author deaths rules, and Schrodinger died in 1961, so it may be up to 2031 in that country... messy stuff. There's also the question of wether copyright is was tranferred to AdP at publication or not.
Contains formulas such as the Schrödinger equation solution for the hydrogen atom (1''):
where:
  • In order for there to be numerical agreement, must have the value
  • , are the charge and mass of the electron
Wikipedia dumps by Ciro Santilli 35 Updated +Created
Per-table dumps created with mysqldump and listed at: dumps.wikimedia.org/. Most notably, for the English Wikipedia: dumps.wikimedia.org/enwiki/latest/
The tables are "documented" under: www.mediawiki.org/wiki/Manual:Database_layout, e.g. the central "page" table: www.mediawiki.org/wiki/Manual:Page_table. But in many cases it is impossible to deduce what fields are from those docs.
Wiley (publisher) by Ciro Santilli 35 Updated +Created
Linux CLI HOWTO by Ciro Santilli 35 Updated +Created
PDF tool by Ciro Santilli 35 Updated +Created
Project Gutenberg by Ciro Santilli 35 Updated +Created
enwiki-latest-category.sql by Ciro Santilli 35 Updated +Created
dumps.wikimedia.org/enwiki/latest/enwiki-latest-category.sql.gz contains a list of categories. It only contains the categories and some counts, but it doesn't contain the subcategories and pages under each category, so it is a bit pointless.
The SQL first defines the table:
CREATE TABLE `category` (
  `cat_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `cat_title` varbinary(255) NOT NULL DEFAULT '',
  `cat_pages` int(11) NOT NULL DEFAULT 0,
  `cat_subcats` int(11) NOT NULL DEFAULT 0,
  `cat_files` int(11) NOT NULL DEFAULT 0,
  PRIMARY KEY (`cat_id`),
  UNIQUE KEY `cat_title` (`cat_title`),
  KEY `cat_pages` (`cat_pages`)
) ENGINE=InnoDB AUTO_INCREMENT=249228235 DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED;
followed by a few humongous inserts:
INSERT INTO `category` VALUES (2,'Unprintworthy_redirects',1597224,20,0),(3,'Computer_storage_devices',88,11,0)
which we can see at: en.wikipedia.org/wiki/Category:Computer_storage_devices
Se see that en.wikipedia.org/wiki/Category:Computer_storage_devices_by_company
so it contains only categories.
We can check this with:
sed -s 's/),/\n/g' enwiki-latest-category.sql | grep Computer_storage_devices
and it shows:
(3,'Computer_storage_devices',88,11,0
(521773,'Computer_storage_devices_by_company',6,6,0
There doesn't seem to be any interlink between the categories, only page and subcategory counts therefore.
enwiki-latest-categorylinks.sql by Ciro Santilli 35 Updated +Created
Download all Wikipedia categories by Ciro Santilli 35 Updated +Created
Let's observe them in MySQL:
mysql enwiki -e "select page_id, page_namespace, page_title, page_is_redirect from page where page_namespace in (0, 14) and page_title in ('Computer_storage_devices', 'Computer_data_storage')"
outputs:
+----------+----------------+--------------------------+------------------+
| page_id  | page_namespace | page_title               | page_is_redirect |
+----------+----------------+--------------------------+------------------+
|     5300 |              0 | Computer_data_storage    |                0 |
| 42371130 |              0 | Computer_storage_devices |                1 |
|   711721 |             14 | Computer_data_storage    |                0 |
|   895945 |             14 | Computer_storage_devices |                0 |
+----------+----------------+--------------------------+------------------+
mysql enwiki -e "select cl_from, cl_to from categorylinks where cl_from in (5300, 711721, 895945, 42371130)"
gives:
+----------+-----------------------------------------------------------------------+
| cl_from  | cl_to                                                                 |
+----------+-----------------------------------------------------------------------+
|     5300 | All_articles_containing_potentially_dated_statements                  |
|     5300 | Articles_containing_potentially_dated_statements_from_2009            |
|     5300 | Articles_containing_potentially_dated_statements_from_2011            |
|     5300 | Articles_with_GND_identifiers                                         |
|     5300 | Articles_with_NKC_identifiers                                         |
|     5300 | Articles_with_short_description                                       |
|     5300 | Computer_architecture                                                 |
|     5300 | Computer_data_storage                                                 |
|     5300 | Short_description_matches_Wikidata                                    |
|     5300 | Use_dmy_dates_from_June_2020                                          |
|     5300 | Wikipedia_articles_incorporating_text_from_the_Federal_Standard_1037C |
|   711721 | Computer_architecture                                                 |
|   711721 | Computer_data                                                         |
|   711721 | Computer_hardware_by_type                                             |
|   711721 | Data_storage                                                          |
|   895945 | Computer_data_storage                                                 |
|   895945 | Computer_peripherals                                                  |
|   895945 | Recording_devices                                                     |
| 42371130 | Redirects_from_alternative_names                                      |
+----------+-----------------------------------------------------------------------+
So we see that cl_from encodes the parent categories:
So to find all articls and categories under a given category title, say en.wikipedia.org/wiki/Category:Mathematics we can run:
mariadb enwiki -e "select cl_from, cl_to, page_namespace, page_title from categorylinks inner join page on page_namespace in (0, 14) and cl_from = page_id and cl_to = 'Mathematics'"
csvkit by Ciro Santilli 35 Updated +Created
Lots of features, but slow because written in Python. A faster version may be csvtools. Also some annoyances like obtuse header handing and missing features like grep + cut in one go: csvgrep and select column in csvkit.
csvtool by Ciro Santilli 35 Updated +Created
A compiled executable under /usr/bin/csvtool, has an Ubuntu 23.04 package: manpages.ubuntu.com/manpages/lunar/en/man1/csvtool.1.html
There seems to be no sane filtering mechanism however: stackoverflow.com/questions/46540752/using-csvtool-call-to-filter-csv-in-bash
csvtools by Ciro Santilli 35 Updated +Created
A fast version of a somewhat subset of csvkit, written in C.
Build failed with undefined reference to pcre_config on Ubuntu 23.04: github.com/DavyLandman/csvtools/issues/18
Unfortunately it is lacking some basic options, like optional header + selecting column by index on csvgrep (though csvcut has it). The project seems kind of dead.
Also unclear if it allows to filter + print only selected columns.
xsv by Ciro Santilli 35 Updated +Created
pdftk by Ciro Santilli 35 Updated +Created
Extract certain pages of a PDF:
pdftk input.pdf cat 2-4 output out1.pdf
Rust library by Ciro Santilli 35 Updated +Created

There are unlisted articles, also show them or only show them.