Download all Wikipedia categories by Ciro Santilli 35 Updated +Created
Jewish_physicists
Let's observe them in MySQL:
mysql enwiki -e "select page_id, page_namespace, page_title, page_is_redirect from page where page_namespace in (0, 14) and page_title in ('Computer_storage_devices', 'Computer_data_storage')"
outputs:
+----------+----------------+--------------------------+------------------+
| page_id  | page_namespace | page_title               | page_is_redirect |
+----------+----------------+--------------------------+------------------+
|     5300 |              0 | Computer_data_storage    |                0 |
| 42371130 |              0 | Computer_storage_devices |                1 |
|   711721 |             14 | Computer_data_storage    |                0 |
|   895945 |             14 | Computer_storage_devices |                0 |
+----------+----------------+--------------------------+------------------+
mysql enwiki -e "select cl_from, cl_to from categorylinks where cl_from in (5300, 711721, 895945, 42371130)"
gives:
+----------+-----------------------------------------------------------------------+
| cl_from  | cl_to                                                                 |
+----------+-----------------------------------------------------------------------+
|     5300 | All_articles_containing_potentially_dated_statements                  |
|     5300 | Articles_containing_potentially_dated_statements_from_2009            |
|     5300 | Articles_containing_potentially_dated_statements_from_2011            |
|     5300 | Articles_with_GND_identifiers                                         |
|     5300 | Articles_with_NKC_identifiers                                         |
|     5300 | Articles_with_short_description                                       |
|     5300 | Computer_architecture                                                 |
|     5300 | Computer_data_storage                                                 |
|     5300 | Short_description_matches_Wikidata                                    |
|     5300 | Use_dmy_dates_from_June_2020                                          |
|     5300 | Wikipedia_articles_incorporating_text_from_the_Federal_Standard_1037C |
|   711721 | Computer_architecture                                                 |
|   711721 | Computer_data                                                         |
|   711721 | Computer_hardware_by_type                                             |
|   711721 | Data_storage                                                          |
|   895945 | Computer_data_storage                                                 |
|   895945 | Computer_peripherals                                                  |
|   895945 | Recording_devices                                                     |
| 42371130 | Redirects_from_alternative_names                                      |
+----------+-----------------------------------------------------------------------+
So we see that cl_from encodes the parent categories:
  • parent categories of categories:
    • en.wikipedia.org/wiki/Category:Computer_data_storage, which has ID 711721, has parent categories: "Computer hardware by type", "Computer data", "Data storage", "Computer architecture". This matches exactly on the database. These are all encoded on the source code of the page:
      {{DEFAULTSORT:Storage}}
      [[Category:Computer hardware by type]]
      [[Category:Computer data|Storage]]
      [[Category:Data storage|Computer]]
      [[Category:Computer architecture]]
    • en.wikipedia.org/wiki/Category:Computer_storage_devices has parent categories: "Computer data storage", "Recording devices", "Computer peripherals". This matches exactly on the database.
  • parent categories of pages:
    • en.wikipedia.org/wiki/Computer_storage_devices whish is a redirect gets the magic category "Redirects_from_alternative_names", a humongous placeholder with many thousands of pages: en.wikipedia.org/wiki/Category:Redirects_from_alternative_names
    • en.wikipedia.org/wiki/Computer_data_storage shows only two categories onthe web UI: "Computer data storage" and "Computer architecture". Both of these are present on the database and at the end of the source code:
      {{DEFAULTSORT:Computer Data Storage}}
      [[Category:Computer data storage| ]]
      [[Category:Computer architecture]]
      The others appear to be more magic. Two of them we can guess from the templates:
      {{short description|Storage of digital data readable by computers}}
      {{Use dmy dates|date=June 2020}}
      are likely Use_dmy_dates_from_June_2020 and Articles_with_short_description but the rest is more magic and not necessarily present in-source.
So to find all articls and categories under a given category title, say en.wikipedia.org/wiki/Category:Mathematics we can run:
mariadb enwiki -e "select cl_from, cl_to, page_namespace, page_title from categorylinks inner join page on page_namespace in (0, 14) and cl_from = page_id and cl_to = 'Mathematics'"
csvkit by Ciro Santilli 35 Updated +Created
Lots of features, but slow because written in Python. A faster version may be csvtools. Also some annoyances like obtuse header handing and missing features like grep + cut in one go: csvgrep and select column in csvkit.
csvtool by Ciro Santilli 35 Updated +Created
A compiled executable under /usr/bin/csvtool, has an Ubuntu 23.04 package: manpages.ubuntu.com/manpages/lunar/en/man1/csvtool.1.html
There seems to be no sane filtering mechanism however: stackoverflow.com/questions/46540752/using-csvtool-call-to-filter-csv-in-bash
csvtools by Ciro Santilli 35 Updated +Created
A fast version of a somewhat subset of csvkit, written in C.
Build failed with undefined reference to pcre_config on Ubuntu 23.04: github.com/DavyLandman/csvtools/issues/18
Unfortunately it is lacking some basic options, like optional header + selecting column by index on csvgrep (though csvcut has it). The project seems kind of dead.
Also unclear if it allows to filter + print only selected columns.
xsv by Ciro Santilli 35 Updated +Created
pdftk by Ciro Santilli 35 Updated +Created
Extract certain pages of a PDF:
pdftk input.pdf cat 2-4 output out1.pdf
Rust library by Ciro Santilli 35 Updated +Created
csvgrep and select column in csvkit by Ciro Santilli 35 Updated +Created
There seems to be no way without a pipe, you seem to need to reparse the columns, e.g. the tutorial at: csvkit.readthedocs.io/en/latest/tutorial/2_examining_the_data.html#csvgrep-find-the-data-you-need does:
csvcut -c county,item_name,total_cost data.csv | csvgrep -c county -m LANCASTER
AGI architecture by Ciro Santilli 35 Updated +Created
Video 1.
From Machine Learning to Autonomous Intelligence by Yann LeCun (2023)
Source. After a bunch of B.S., LeCun goes on to describe his AGI architecture. Nothing ground breaking, but not bad either.
IBM Db2 by Ciro Santilli 35 Updated +Created
Oracle Database by Ciro Santilli 35 Updated +Created
Often known simply as SQL Server, a terrible thing that makes it impossible to find portable SQL answers on Google! You just have to Google by specific SQL implementation unfortunately to find anything about the open source ones.
Expired omain trackers by Ciro Santilli 35 Updated +Created
SQLite import CSV by Ciro Santilli 35 Updated +Created

Unlisted articles are being shown, click here to show only listed articles.