Database management system Updated +Created
A software that implements some database system, e.g. PostgreSQL or MySQL are two (widely extended) SQL implementations.
Download all Wikipedia categories Updated +Created
Jewish_physicists
Let's observe them in MySQL:
mysql enwiki -e "select page_id, page_namespace, page_title, page_is_redirect from page where page_namespace in (0, 14) and page_title in ('Computer_storage_devices', 'Computer_data_storage')"
outputs:
+----------+----------------+--------------------------+------------------+
| page_id  | page_namespace | page_title               | page_is_redirect |
+----------+----------------+--------------------------+------------------+
|     5300 |              0 | Computer_data_storage    |                0 |
| 42371130 |              0 | Computer_storage_devices |                1 |
|   711721 |             14 | Computer_data_storage    |                0 |
|   895945 |             14 | Computer_storage_devices |                0 |
+----------+----------------+--------------------------+------------------+
mysql enwiki -e "select cl_from, cl_to from categorylinks where cl_from in (5300, 711721, 895945, 42371130)"
gives:
+----------+-----------------------------------------------------------------------+
| cl_from  | cl_to                                                                 |
+----------+-----------------------------------------------------------------------+
|     5300 | All_articles_containing_potentially_dated_statements                  |
|     5300 | Articles_containing_potentially_dated_statements_from_2009            |
|     5300 | Articles_containing_potentially_dated_statements_from_2011            |
|     5300 | Articles_with_GND_identifiers                                         |
|     5300 | Articles_with_NKC_identifiers                                         |
|     5300 | Articles_with_short_description                                       |
|     5300 | Computer_architecture                                                 |
|     5300 | Computer_data_storage                                                 |
|     5300 | Short_description_matches_Wikidata                                    |
|     5300 | Use_dmy_dates_from_June_2020                                          |
|     5300 | Wikipedia_articles_incorporating_text_from_the_Federal_Standard_1037C |
|   711721 | Computer_architecture                                                 |
|   711721 | Computer_data                                                         |
|   711721 | Computer_hardware_by_type                                             |
|   711721 | Data_storage                                                          |
|   895945 | Computer_data_storage                                                 |
|   895945 | Computer_peripherals                                                  |
|   895945 | Recording_devices                                                     |
| 42371130 | Redirects_from_alternative_names                                      |
+----------+-----------------------------------------------------------------------+
So we see that cl_from encodes the parent categories:
  • parent categories of categories:
    • en.wikipedia.org/wiki/Category:Computer_data_storage, which has ID 711721, has parent categories: "Computer hardware by type", "Computer data", "Data storage", "Computer architecture". This matches exactly on the database. These are all encoded on the source code of the page:
      {{DEFAULTSORT:Storage}}
      [[Category:Computer hardware by type]]
      [[Category:Computer data|Storage]]
      [[Category:Data storage|Computer]]
      [[Category:Computer architecture]]
    • en.wikipedia.org/wiki/Category:Computer_storage_devices has parent categories: "Computer data storage", "Recording devices", "Computer peripherals". This matches exactly on the database.
  • parent categories of pages:
    • en.wikipedia.org/wiki/Computer_storage_devices whish is a redirect gets the magic category "Redirects_from_alternative_names", a humongous placeholder with many thousands of pages: en.wikipedia.org/wiki/Category:Redirects_from_alternative_names
    • en.wikipedia.org/wiki/Computer_data_storage shows only two categories onthe web UI: "Computer data storage" and "Computer architecture". Both of these are present on the database and at the end of the source code:
      {{DEFAULTSORT:Computer Data Storage}}
      [[Category:Computer data storage| ]]
      [[Category:Computer architecture]]
      The others appear to be more magic. Two of them we can guess from the templates:
      {{short description|Storage of digital data readable by computers}}
      {{Use dmy dates|date=June 2020}}
      are likely Use_dmy_dates_from_June_2020 and Articles_with_short_description but the rest is more magic and not necessarily present in-source.
So to find all articls and categories under a given category title, say en.wikipedia.org/wiki/Category:Mathematics we can run:
mariadb enwiki -e "select cl_from, cl_to, page_namespace, page_title from categorylinks inner join page on page_namespace in (0, 14) and cl_from = page_id and cl_to = 'Mathematics'"
PostgreSQL Updated +Created
PostgreSQL feels good.
Had a look at the source tree, and also felt good.
If Oracle is the Microsoft of database, Postgres is the Linux, and MySQL (or more precisely MariaDB) is the FreeBSD (i.e. the one that got delayed by legal issues). Except that their software licenses were accidentally swapped.
The only problem with Postgres is its name. PostgreSQL is so unpronounceable and so untypeable that you should just call it "Postgres" like everyone else.
SQL histogram Updated +Created
OK, there's a billion questions:
Let's try it on SQLite 3.40.1, Ubuntu 23.04. Data setup:
sqlite3 tmp.sqlite 'create table t(x integer)'
sqlite3 tmp.sqlite <<EOF
insert into t values (
  0,
  2,
  2,
  3,

  5,
  6,
  6,
  8,
  9,

  17,
)
EOF
sqlite3 tmp.sqlite 'create index tx on t(x)'
For a bin size of 5 ignoring empty ranges we can:
sqlite3 tmp.sqlite <<EOF
select floor(x/5)*5 as x,
       count(*) as cnt
from t
group by 1
order by 1
EOF
which produces the desired:
0|4
5|5
15|1
And to consider empty ranges we can use SQL genenerate_series + as per stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql:
sqlite3 tmp.sqlite <<EOF
select x, sum(cnt) from (
  select floor(x/5)*5 as x,
         count(*) as cnt
    from t
    group by 1
  union
  select *, 0 as cnt from generate_series(0, 15, 5)
)
group by x
EOF
which outputs the desired:
0|4
5|5
10|0
15|1
SQL tree traversal Updated +Created