Type of database

Table of contents
- NoSQL Type of database
  - LevelDB NoSQL
    - Dump LevelDB LevelDB
      - LevelDBDumper Dump LevelDB
  - MongoDB NoSQL
    - Install MongoDB on Ubuntu MongoDB
- Relational database Type of database
  - Relational database management system Relational database
    - SQL Relational database management system

LevelDB

One "LevelDB" database contains multiple file in a directory. Off the bat inferior to SQLite which stores everything in a single file!

LevelDBDumper

 0  0

github.com/mdawsonuk/LevelDBDumper

github.com/mdawsonuk/LevelDBDumper/tree/e750a27ff58443ecc410b5c16abbdc539d617387#installation worked on Ubuntu 23.10 Annoying installation, but worked: github.com/mdawsonuk/LevelDBDumper/issues/13

Initial issues off-the-bat:

MongoDB

 0  0

List databases:

echo 'show dbs' | mongo

Delete database:

use mydb
db.dropDatabase()

or:

echo 'db.dropDatabase()' | mongo mydb

View collections within a database:

echo 'db.getCollectionNames()' | mongo mydb

Show all data from one of the collections: stackoverflow.com/questions/24985684/mongodb-show-all-contents-from-all-collections

echo 'db.collectionName.find()' | mongo mydb

Install MongoDB on Ubuntu

 0  0

Tested as of Ubuntu 20.04, there is no Mongo package available by default due to their change to Server Side Public License, which Debian opposed. Therefore, you have to add their custom PPA as mentioned at: docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/

Relational database management system (RDBMS)

 0  0

SQL example

 0  0

We have some runnable SQL examples with assertion under the sequelize/raw directory.

These examples are written in the Sequelize library using raw queries.

Sequelize is used minimally, just to feed raw queries in transparently to any underlying database, and get minimally parsed results out for us, which we then assert with standard JavaScript. The queries themselves are all written by hand.

By default the examples run on SQLite. Just like the examples from sequelize example, you can set the database at runtime as:

./index.js or ./index.js l: SQLite
./index.js p: PostgreSQL. You must manually create a database called tmp and ensure that peer authentication works for it

Here we list only examples which we believe are standard SQL, and should therefore work across different SQL implementations:

nodejs/sequelize/raw/index.js: basic hello world to demonstrate the setup and very simple functionality
nodejs/sequelize/raw/many_to_many.js: illustrates many-to-many relations with JOIN. Contains:
SQL transaction examples:
- nodejs/sequelize/raw/commit_error.js: stackoverflow.com/questions/27245101/why-should-we-use-rollback-in-sql-explicitly/27245234#27245234 and stackoverflow.com/questions/48277519/how-to-use-commit-and-rollback-in-a-postgresql-function/48277708#48277708 suggest that on PostgreSQL, once something fails inside a transaction, all queries in the current transaction are ignored, and COMMIT simply does a ROLLBACK. Let's check. Yup, true for Postgres, but false for SQLite, SQLite just happily runs anything it can, you really need ROLLBACK for it.
- SQL isolation level example
GROUP BY and SQL aggregate functions:
- nodejs/sequelize/raw/group_by_extra_column.js: let's see if it blows up or not on different DB systems, sqlite3 Node.js package allows it:
  - github.com/sequelize/sequelize/issues/5481#issuecomment-964387232
  - dba.stackexchange.com/questions/141594/how-select-column-does-not-list-in-group-by-clause/141600 says that it was allowed in SQL:1999 when there are no ambiguities due to constraints, e.g. when grouping by unique columns
  - github.com/postgres/postgres/blob/REL_13_5/src/test/regress/sql/functional_deps.sql#L27 shows that PostgreSQL wants it to work for UNIQUE NOT NULL, but they just haven't implemented it as of 13.5, where it only works if you group by PRIMARY KEY
  - dba.stackexchange.com/questions/158015/why-can-i-select-all-fields-when-grouping-by-primary-key-but-not-when-grouping-b also says that UNIQUE NOT NULL doesn't work. Dan Lenski then points to a rationale mailing list thread.
- nodejs/sequelize/raw/group_by_max_full_row.js: here we try to get the full row of each group at which a given column reaches the max of the group
  - Postgres: has SELECT DISCINTCT ON which works perfectly if you only want one row in case of multiple rows attaining the max. ON is an extension to the standard unfortunately: www.postgresql.org/docs/9.3/sql-select.html#SQL-DISTINCT Docs specify that it always respects ORDER BY when selecting the row.
    stackoverflow.com/questions/586781/postgresql-fetch-the-row-which-has-the-max-value-for-a-column asks it without the multiple matches use case
    stackoverflow.com/questions/586781/postgresql-fetch-the-rows-which-have-the-max-value-for-a-column-in-each-group/587209#587209 also present in simpler form at stackoverflow.com/questions/121387/fetch-the-rows-which-have-the-max-value-for-a-column-for-each-distinct-value-of/123481#123481 gives a very nice OUTER JOIN only solution! Incredible, very elegant.
    dba.stackexchange.com/questions/171938/get-only-rows-with-max-group-value asks specifically the case of multiple matches to the max
  - SQLite:
    stackoverflow.com/questions/48326957/row-with-max-value-per-group-sqlite
    stackoverflow.com/questions/48326957/row-with-max-value-per-group-sqlite/48328243#48328243 teaches us that in SQLite min and max are magic and guarantee that the matching row is returned
    stackoverflow.com/questions/48326957/row-with-max-value-per-group-sqlite/72996649#72996649 Ciro Santilli uses the magic of ROW_NUMBER
    stackoverflow.com/questions/17277152/sqlite-select-distinct-of-one-column-and-get-the-others/71924314#71924314 get any full row without specifying which, we teach how to specify
    code.djangoproject.com/ticket/22696 WONTFIXed DISTINCT ON
    stackoverflow.com/questions/50846722/what-is-the-difference-between-postgres-distinct-vs-distinct-on/72997494#72997494 DISTINCT vs DISTINCT ON, somewhat related question
  - stackoverflow.com/questions/5803032/group-by-to-return-entire-row asks how to take the top N with distinct after order limit. I don't know how to do it in Postgres
- nodejs/sequelize/raw/most_frequent.js: illustrates a few variants of findind the mode, including across GROUP
  - stackoverflow.com/questions/12235595/find-most-frequent-value-in-sql-column/72979899#72979899
- nodejs/sequelize/raw/group_by_max_n.js: get the top N in each group
order results in the same order as IN:
- MysQL: stackoverflow.com/questions/396748/ordering-by-the-order-of-values-in-a-sql-in-clause
- PostgreSQL:
  - stackoverflow.com/questions/866465/order-by-the-in-value-list
  - dba.stackexchange.com/questions/59394/order-of-returned-rows-with-in-statement
LIMIT by a running total: TODO links

SQL implementation

 0  0

IBM Db2

 0  0

MySQL

 0  0

sudo mysql

works on Ubuntu 20.10.

Create user for further logins without sudo askubuntu.com/questions/915585/how-to-login-mysql-shell-when-mysql-have-no-password/1325689#1325689:

sudo mysql -e "CREATE USER $USER"

Run command from CLI stackoverflow.com/questions/1602904/how-do-you-run-a-single-query-through-mysql-from-the-command-line

mysql -e 'SHOW DATABASES'

Create test user with password:

sudo mysql -e 'CREATE USER user0 IDENTIFIED WITH mysql_native_password BY "a"'
sudo mysql -e 'GRANT ALL PRIVILEGES ON database_name.* TO "user0"'

and login as that user:

mysql -u user0 -p

mysql -u user0 -pmypassword

The IDENTIFIED WITH mysql_native_password part is to overcome "Client does not support authentication protocol requested by server" when connecting from Node.js.

List users:

sudo mysql -e 'SELECT * FROM mysql.user'

View permissions for each user on each DB: serverfault.com/questions/263868/how-to-know-all-the-users-that-can-access-a-database-mysql

sudo mysql -e 'SELECT * FROM mysql.db'

List databases:

sudo mysql -e 'SHOW DATABASES'

Create database:

sudo mysql -e 'CREATE DATABASE mydb0'

Destroy database:

sudo mysql -e 'DROP DATABASE mydb0'

Show tables in database:

sudo mysql -e 'SHOW TABLES' mydb0

or:

sudo mysql -e 'SHOW TABLES FROM mydb0'

mysqldump

 0  0

dev.mysql.com/doc/refman/8.0/en/mysqldump.html

mysqldump to CSV

 0  0

MariaDB

 0  0

Dude's a legend. Sells company for a few million. Then forks the open source project next year. Love it.

PostgreSQL (Postgres)

 0  0

PostgreSQL feels good.

Its feature set is insanely large! Just look at stuff like: stackoverflow.com/questions/1986491/sql-split-string-by-space-into-table-in-postgresql/1993058#1993058

Had a look at the source tree, and also felt good.

If Oracle is the Microsoft of database, Postgres is the Linux, and MySQL (or more precisely MariaDB) is the FreeBSD (i.e. the one that got delayed by legal issues). Except that their software licenses were accidentally swapped.

The only problem with Postgres is its name. PostgreSQL is so unpronounceable and so untypeable that you should just call it "Postgres" like everyone else.

PostgreSQL getting started

 0  0

On Ubuntu 20.10 PostgreSQL 12.6, login with psql on my default username without sudo fails with: stackoverflow.com/questions/11919391/postgresql-error-fatal-role-username-does-not-exist

This is the one that worked on Ubuntu 21.04: stackoverflow.com/questions/11919391/postgresql-error-fatal-role-username-does-not-exist/38444152#38444152

sudo -u postgres createuser -s $(whoami)
createdb $(whoami)

Explanation:

sudo -u postgres uses the postgres user via peer authentication
-s in createuser -s: make it a superuser
createdb: TODO why do we have to create a table with the same name as the user? Otherwise login fails.

You can now run psql without any password. This works without password due to peer authentication:

sudo cat /etc/postgresql/12/main/pg_hba.conf

shows that peer authentication is available to all users apparently:

local   all             postgres                                peer

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     peer

List users:

psql -c '\du'

output:

                                    List of roles
  Role name  |                         Attributes                         | Member of
-------------+------------------------------------------------------------+-----------
 ciro        | Superuser, Create role, Create DB                          | {}
 owning_user |                                                            | {}
 postgres    | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

Delete user later on:

psql -c 'DROP USER username;'

Create a database:

createdb testdb0

Help toplevel:

help

Get help for Postgres commands such as \h and so on:

\?

List supported SQL commands:

\h

Show syntax for one type of command:

\h SELECT

List all databases:

psql -c '\l'

which shows:

    Name     |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-------------+----------+----------+-------------+-------------+-----------------------
 ciro        | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 |
 postgres    | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 |
 template0   | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =c/postgres          +
             |          |          |             |             | postgres=CTc/postgres
 template1   | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =c/postgres          +
             |          |          |             |             | postgres=CTc/postgres
 testdb0     | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 |
(6 rows)

Delete a database:

psql -c 'DROP DATABASE "testdb0";'

If you didn't give a database from the command line e.g.:

psql

you can do that afterwards with:

\c testdb0

Let's create a table and test that it is working:

psql testdb0 -c 'CREATE TABLE table0 (int0 INT, char0 CHAR(16));'

List tables, no special tables:

psql testdb0 -c '\dt'

gives:

        List of relations
 Schema |  Name  | Type  | Owner
--------+--------+-------+-------
 public | table0 | table | ciro
(1 row)

View table schema: stackoverflow.com/questions/109325/postgresql-describe-table

psql testdb0 -c '\d+ table0'

output:

                                      Table "public.table0"
 Column |     Type      | Collation | Nullable | Default | Storage  | Stats target | Description
--------+---------------+-----------+----------+---------+----------+--------------+-------------
 int0   | integer       |           |          |         | plain    |              |
 char0  | character(16) |           |          |         | extended |              |

Insert some data into it and get the data out:

psql testdb0 -c "INSERT INTO table0 (int0, char0) VALUES (2, 'two'), (3, 'three'), (5, 'five'), (7, 'seven');"
psql testdb0 -c 'SELECT * FROM table0;'

output:

 int0 |      char0
------+------------------
    2 | two
    3 | three
    5 | five
    7 | seven
(4 rows)

Delete the table:

psql testdb0 -c 'DROP TABLE table0;'

PostgreSQL HOWTO

 0  0

output one column per line: stackoverflow.com/questions/9604723/alternate-output-format-for-psql-showing-one-column-per-line-with-column-name
PostgreSQL does not automatically index foreign keys! stackoverflow.com/questions/970562/postgres-and-indexes-on-foreign-keys-and-primary-keys

PostgreSQL create test data

 0  0

Generate random text in PostgreSQL

 0  0

This one is good: stackoverflow.com/questions/36533429/generate-random-string-in-postgresql/44200391#44200391 as it also describes how to generate multiple values.

with symbols(characters) as (VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'))
select string_agg(substr(characters, (random() * (length(characters) - 1) + 1)::INTEGER, 1), '')
from symbols
join generate_series(1,8) as word(chr_idx) on 1 = 1 -- word length
join generate_series(1,10000) as words(idx) on 1 = 1 -- # of words
group by idx;

Then you can insert it into a row with:

create table tmp(s text);
insert into tmp(s)
  select s from
  (
    with symbols(characters) as (VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'))
    select string_agg(substr(characters, (random() * (length(characters) - 1) + 1)::INTEGER, 1), '') as asdf
    from symbols
    join generate_series(1,8) as word(chr_idx) on 1 = 1 -- word length
    join generate_series(1,10000) as words(idx) on 1 = 1 -- # of words
    group by idx
  ) as sub(s);

A more convenient approach is likely to define the function:

CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$
select
  string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word
from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789    --')) as symbols(characters)
  join generate_series(1, $1) on 1 = 1
$$ language sql;

And then:

create table tmp(s text, t text);
insert into tmp(s) select random_string(10) from generate_series(10);

PostgreSQL full-text search

 0  0

This section was tested on Ubuntu 24.10, PostgreSQL 16.6.

Let's create some test data like this:

time psql tmp -c 'DROP TABLE IF EXISTS fts;'
time psql tmp -c 'CREATE TABLE fts(s TEXT, i INTEGER);'
time psql tmp <<'EOF'
INSERT INTO fts SELECT
  i::text || ' ' ||
    (i * 2  )::text || ' ' ||
    (i * 5  )::text || ' ' ||
    (i * 7  )::text || ' ' ||
    (i * 11 )::text || ' ' ||
    (i * 13 )::text || ' ' ||
    (i * 17 )::text || ' ' ||
    (i * 23 )::text || ' ' ||
    (i * 29 )::text || ' ' ||
    (i * 31 )::text
  ,
  i % 100
FROM generate_series(1::bigint, 100000000::bigint) AS s(i);
EOF

The creation time was 2m13s, and the final size was:

    table_name    | pg_size_pretty | pg_total_relation_size
------------------+----------------+------------------------
 fts              | 13 GB          |            14067326976

This test data will be simple to predict what each line contains so we can make educated queries, while also posing some difficulty to the RDMS. As per:

time psql tmp -c 'SELECT * FROM fts LIMIT 10;'

the first columns look like:

                  s                  | i
-------------------------------------+----
 1 2 5 7 11 13 17 23 29 31           |  1
 2 4 10 14 22 26 34 46 58 62         |  2
 3 6 15 21 33 39 51 69 87 93         |  3
 4 8 20 28 44 52 68 92 116 124       |  4
 5 10 25 35 55 65 85 115 145 155     |  5
 6 12 30 42 66 78 102 138 174 186    |  6
 7 14 35 49 77 91 119 161 203 217    |  7
 8 16 40 56 88 104 136 184 232 248   |  8
 9 18 45 63 99 117 153 207 261 279   |  9
 10 20 50 70 110 130 170 230 290 310 | 10

We aimed to create a test table of size around 10 GB, as in practice it is around that order of size that index speedups start to become very obvious on a SSD-based system.

Before we create the index, let's see if our non-indexed queries are slow enough for our tests:

time psql tmp -c "SELECT * FROM fts WHERE s LIKE '% 50000000 %';"

which gives:

                                                 s                                                 | i
---------------------------------------------------------------------------------------------------+---
 10000000 20000000 50000000 70000000 110000000 130000000 170000000 230000000 290000000 310000000   | 0
 25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000 | 0
(2 rows)


real    0m11.758s
user    0m0.017s
sys     0m0.008s

so it should be enough to observe the index speedup.

Now let's create the index. First we create a generated column that splits the strings with to_tsvector, and then we index that split column:

time psql tmp <<'EOF'
ALTER TABLE fts ADD COLUMN s_ts tsvector
  GENERATED ALWAYS AS (to_tsvector('english', s)) STORED;
EOF
time psql tmp -c 'CREATE INDEX s_ts_gin_idx ON fts USING GIN (s_ts);'

These commands took 8m51s and 40m8s and the DB size went up about 5x:

    table_name    | pg_size_pretty | pg_total_relation_size
------------------+----------------+------------------------
 fts              | 69 GB          |            74487758848

And finally let's try out the index:

time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000');"

which "instantly" gives us in 0m0.129s:

                                                   s                                                   | i
-------------------------------------------------------------------------------------------------------+---
 10000000 20000000 50000000 70000000 110000000 130000000 170000000 230000000 290000000 310000000       | 0
 25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000     | 0
 50000000 100000000 250000000 350000000 550000000 650000000 850000000 1150000000 1450000000 1550000000 | 0

so the index worked!

We understand from this that it only find exact word hits.

Another important use case is to search for prefixes of words, e.g. as you'd want in a simple autocompletion system. This can be achieved by adding :* at the end of the search term as in:

time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000:*');"

This finishes in the same amount of time, and gives:

                                                     s                                                     | i
-----------------------------------------------------------------------------------------------------------+----
 10000000 20000000 50000000 70000000 110000000 130000000 170000000 230000000 290000000 310000000           |  0
 38461539 76923078 192307695 269230773 423076929 500000007 653846163 884615397 1115384631 1192307709       | 39
 45454546 90909092 227272730 318181822 500000006 590909098 772727282 1045454558 1318181834 1409090926      | 46
 50000000 100000000 250000000 350000000 550000000 650000000 850000000 1150000000 1450000000 1550000000     |  0
 71428572 142857144 357142860 500000004 785714292 928571436 1214285724 1642857156 2071428588 2214285732    | 72
 100000000 200000000 500000000 700000000 1100000000 1300000000 1700000000 2300000000 2900000000 3100000000 |  0
 29411765 58823530 147058825 205882355 323529415 382352945 500000005 676470595 852941185 911764715         | 65
 25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000         |  0

so now we have cool hits such as 500000000, 500000004, 500000005, 500000007 and 500000006. The syntax is also mentioned at:

www.postgresql.org/docs/17/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES

Next we can also try some other queries with multiple terms. Text must contain two words with &:

time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000 & 175000000');"

gives:

                                                   s                                                   | i
-------------------------------------------------------------------------------------------------------+---
 25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000     | 0

Text can contain either word with |:

time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000 | 175000000');"

gives:

                                                    s                                                    | i
---------------------------------------------------------------------------------------------------------+---
 10000000 20000000 50000000 70000000 110000000 130000000 170000000 230000000 290000000 310000000         | 0
 50000000 100000000 250000000 350000000 550000000 650000000 850000000 1150000000 1450000000 1550000000   | 0
 87500000 175000000 437500000 612500000 962500000 1137500000 1487500000 2012500000 2537500000 2712500000 | 0
 25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000       | 0
 35000000 70000000 175000000 245000000 385000000 455000000 595000000 805000000 1015000000 1085000000     | 0

Text can contain the given words sequentially:

time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000 <-> 125000000 <-> 175000000');"

gives:

                                                   s                                                   | i
-------------------------------------------------------------------------------------------------------+---
 25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000     | 0

We can also inspect how words were split by simply doing a SELECT * again:

             s              | i |                                 s_ts
----------------------------+---+----------------------------------------------------------------------
1 2 5 7 11 13 17 23 29 31   | 1 | '1':1 '11':5 '13':6 '17':7 '2':2 '23':8 '29':9 '31':10 '5':3 '7':4
2 4 10 14 22 26 34 46 58 62 | 2 | '10':3 '14':4 '2':1 '22':5 '26':6 '34':7 '4':2 '46':8 '58':9 '62':10
3 6 15 21 33 39 51 69 87 93 | 3 | '15':3 '21':4 '3':1 '33':5 '39':6 '51':7 '6':2 '69':8 '87':9 '93':10

Let's check if the index updates automatically when we do an insert and if insertion seems to have been significantly slowed down by the index:

time psql tmp -c "INSERT INTO fts VALUES ('abcd efgh', 99)"

finishes in:

real    0m0.043s
user    0m0.014s
sys     0m0.010s

so performance is OK. Presumably, the insertion time is proportional to the number of tokens, doing one logarithmic operation per token, so indexing short chunks of text like titles is easy. And then let's find it:

time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', 'efgh');"

which finds it with:

     s     | i
-----------+----
 abcd efgh | 99

so we are all good. Unfortunately, accurate performance benchmarking is a bit harder than that, as the index by default first collects a certain number of updates into memory into the "pending list", before actually inserting them all at once after a certain mass is reached, as documented at: www.postgresql.org/docs/17/gin.html#GIN-IMPLEMENTATION. We are not going that deep today.

The next thing that we need to understand is how to_tsvector tokenizes strings for the english language. For example running:

psql -c "select to_tsvector('english', 'A Dog run runs fast faster two Cats: b c to from 1 é befhyph-afthyph.')"

gives:

'1':13
'afthyph':17
'b':9
'befhyph':16
'befhyph-afthyph':15
'c':10
'cat':8
'dog':2
'fast':5
'faster':6
'run':3,4
'two':7
'é':14

so we understand some of the heuristic normalizations:

prepositions like to and from are gone. These are called stopwords as documented at: www.postgresql.org/docs/17/textsearch-controls.html#TEXTSEARCH-PARSING-DOCUMENTS
words are lowercased and singularized, e.g. Cats becomes cat
hyphenated words are stored both in separate components and in the full hyphenated form:
- 'afthyph':17
- 'befhyph':16
- 'befhyph-afthyph':15

The full list of languages available can be obtained with:

psql -c '\dF'

On Ubuntu 24.10, the list contains major world languages, plus the special simple configuration such that:

psql -c "select to_tsvector('simple', 'A Dog run runs fast faster two Cats: b c to from 1 é befhyph-afthyph.')"

gives:

'1':13
'a':1
'afthyph':17
'b':9
'befhyph':16
'befhyph-afthyph':15
'c':10
'cats':8
'dog':2
'fast':5
'faster':6
'from':12
'run':3
'runs':4
'to':11
'two':7
'é':14

so we understand that it is similar to english but it does not:

seem to have any stopwords
do singularization normalization

From the query side of things, if the query is going to be open to end users on a web interface, we need to understand to_tsquery better. The issue is that to_tsquery is quite brutal and happily throws errors for common things users might do e.g. spaces:

select to_tsquery('english', 'abc def');

giving:

ERROR:  syntax error in tsquery: "abc def"

To avoid such errors, we can use:

plainto_tsquery: ANDs everything
websearch_to_tsquery: supports AND with spaces, OR with or, word negation with -word and concatenation with "my word". But it unfortunately does not support prefixing, which is what everyone and their mother wants for autocomplete: stackoverflow.com/questions/14103880/escaping-special-characters-in-to-tsquery#comment78452351_41804957

Bibliography:

Also posted at:

www.reddit.com/r/PostgreSQL/comments/12yld1o/comment/m3l5nkv/ "Is it worth using Postgres' builtin full-text search or should I go straight to Elastic?", high top Google result for "PostgreSQL full text search" as of 2024. Random, but it's there.

Create a test user in PostgreSQL

 0  0

In order to create a test user with password instead of peer authentication, let's create test user:

createuser -P user0
createdb user0

-P makes it prompt for the users password.

Alternatively, to create the password non-interactively stackoverflow.com/questions/42419559/postgres-createuser-with-password-from-terminal:

psql -c "create role NewRole with login password 'secret'"

Can't find a way using the createuser helper.

We can then login with that password with:

psql -U user0 -h localhost

which asks for the password we've just set, because the -h option turns off peer authentication, and turns off password authentication.

The password can be given non-interactively as shown at stackoverflow.com/questions/6405127/how-do-i-specify-a-password-to-psql-non-interactively with the PGPASSWORD environment variable:

PGPASSWORD=a psql -U user0 -h localhost

Now let's create a test database which user0 can access with an existing superuser account:

createdb user0db0
psql -c 'GRANT ALL PRIVILEGES ON DATABASE user0db0 TO user0'

We can check this permission with:

psql -c '\l'

which now contains:

                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 user0db0  | ciro     | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =Tc/ciro             +
           |          |          |             |             | ciro=CTc/ciro        +
           |          |          |             |             | user0=CTc/ciro

The permission letters are explained at:

user0 can now do the usual table operations on that table:

PGPASSWORD=a psql -U user0 -h localhost user0db0 -c 'CREATE TABLE table0 (int0 INT, char0 CHAR(16));'
PGPASSWORD=a psql -U user0 -h localhost user0db0 -c "INSERT INTO table0 (int0, char0) VALUES (2, 'two'), (3, 'three'), (5, 'five'), (7, 'seven');"
PGPASSWORD=a psql -U user0 -h localhost user0db0 -c 'SELECT * FROM table0;'

Peer authentication

 0  0

www.postgresql.org/docs/13/auth-peer.html

Uses the name of the current Linux user to login without a password.

PostgreSQL logging

 0  0

stackoverflow.com/questions/722221/how-to-log-postgresql-queries

Ubuntu 21.10 has a certain default level of logging by default to:

/var/log/postgresql/postgresql-13-main.log

but it does not log everything, only/mostly errors it seems.

Setting:

log_statement = 'all'

under:

/etc/postgresql/13/main/postgresql.conf

and then restarting the server:

sudo service restart postgresql

just works.

Realtime monitoring for long queries instead: stackoverflow.com/questions/8597516/app-to-monitor-postgresql-queries-in-real-time

PostgreSQL serialization failure

 0  0

When using SQL REPEATABLE READ isolation level and SQL SERIALIZABLE isolation level, concurrent transactions may fail with a serialization failure, and then you might need to retry them. You server code or your ORM must always account for that.

A good way to explore when it happens is to use the example

PostgreSQL function

 0  0

www.postgresql.org/docs/17/functions-srf.html

PostgreSQL `generate_series`

 0  0

www.postgresql.org/docs/17/functions-srf.html

Pattern you always want to generate Generate random text in PostgreSQL:

CREATE TABLE "mytable" ("i" INTEGER, "j" INTEGER);
INSERT INTO "mytable" SELECT i, i*2 FROM generate_series(1, 10) as s(i);

`to_tsvector`

 0  0

www.postgresql.org/docs/17/textsearch-controls.html

Microsoft SQL Server

 0  0

Transact-SQL (T-SQL)

 0  0

Oracle Database

 0  0

Often known simply as SQL Server, a terrible thing that makes it impossible to find portable SQL answers on Google! You just have to Google by specific SQL implementation unfortunately to find anything about the open source ones.

SQLite

 0  0

The minimalism, serverlessness/lack of temporary caches/lack of permission management, Hipp's religious obsession with efficiency, the use of their own pure Fossil version control ^[ref]. Wait, scrap that last one. Pure beauty!

Official Git mirror: github.com/sqlite/sqlite

Create a table

sqlite3 db.sqlite3 "
CREATE TABLE 'IntegerNames' (int0 INT, char0 CHAR(16));
INSERT INTO 'IntegerNames' (int0, char0) VALUES (2, 'two'), (3, 'three'), (5, 'five'), (7, 'seven');
"

List tables:

sqlite3 db.sqlite3 '.tables'

output:

IntegerNames

Show schema of a table:

sqlite3 db.sqlite3 '.schema IntegerNames'

outputs the query that would generate that table:

CREATE TABLE IF NOT EXISTS 'IntegerNames' (int0 INT, char0 CHAR(16));

Show all data in a table:

sqlite3 db.sqlite3 'SELECT * FROM IntegerNames'

output:

2|two
3|three
5|five
7|seven

 Tagged

D. Richard Hipp

SQLite import CSV

 0  0

SQLite import CSV from stdin

 0  0

SQLite import JSON

 0  0

stackoverflow.com/questions/46407770/how-to-convert-a-json-file-to-an-sqlite-database

SQLite benchmark

 0  0

Python sequence test data generation: stackoverflow.com/questions/18219779/bulk-insert-huge-data-into-sqlite-using-python/76659706#76659706

SQLite C extension

 0  0

Example: sqlite/ip.c, adapted from www.sqlite.org/loadext.html, also mentioned explained at: stackoverflow.com/questions/7638238/sqlite-ip-address-storage/76520885#76520885.

Sample usage in the test program: sqlite/test.sh.

Docs: www.sqlite.org/loadext.html

SQLite isolation levels

 0  0

www.sqlite.org/pragma.html#pragma_read_uncommitted mentions:

The default isolation level for SQLite is SERIALIZABLE

It does not appear possible to achieve the other two levels besides SERIALIZABLE and READ UNCOMMITED

www.sqlite.org/isolation.html

Node.js SQLite bindings

 0  0

`sqlite3` Node.js package

 0  0

Includes its own copy of sqlite3, you don't use the system one, which is good to ensure compatibility. The version is shown at: github.com/mapbox/node-sqlite3/blob/918052b538b0effe6c4a44c74a16b2749c08a0d2/deps/common-sqlite.gypi#L3 SQLite source is tracked compressed in-tree: github.com/mapbox/node-sqlite3/blob/918052b538b0effe6c4a44c74a16b2749c08a0d2/deps/sqlite-autoconf-3360000.tar.gz horrendous. This explains why it takes forever to clone that repository. People who don't believe in git submodules, there's even an official Git mirror at: github.com/sqlite/sqlite

It appears to spawn its own threads via its C extension (since JavaScript is single threaded and and SQLite is not server-based), which allows for parallel queries using multiple threads: github.com/mapbox/node-sqlite3/blob/v5.0.2/src/threading.h

Hello world example: nodejs/node-sqlite3/index.js.

As of 2021, this had slumped back a bit, as maintainers got tired. Unmerged pull requests started piling more, and better-sqlite3 Node.js package started pulling ahead a little.

github.com/mapbox/node-sqlite3/issues/1381 FATAL ERROR: Error::ThrowAsJavaScriptException napi_throw with Node.js worker_threads vs better-sqlite3 Node.js package github.com/JoshuaWise/better-sqlite3/issues/237

`better-sqlite3` Node.js package

 0  0

As claimed on their README, their operation truly appears to be 10x faster than the node-sqlite package!! It is insane!! How can that other package still exist at all?

The only big problem was the lack of ORM, but people are looking into that by adding it to Sequelize:

SQL function

 0  0

 Tagged

PostgreSQL function

SQL set returning function

 0  0

PostgreSQL: www.postgresql.org/docs/current/functions-srf.html

SQL `genenerate_series`

 0  0

SQL aggregate function

 0  0

SQL `COUNT` function

 0  0

Have a look at some interesting examples under nodejs/sequelize/raw/many_to_many.js.

SQL keyword

 0  0

SQL CASCADE

 0  0

stackoverflow.com/questions/59297/when-why-to-use-cascading-in-sql-server

DELETE (SQL)

 0  0

 Tagged

DELETE with JOIN (SQL)

Delete all duplicate rows in SQL

 0  0

SQLite with rowid: stackoverflow.com/questions/8190541/deleting-duplicate-rows-from-sqlite-database
- PostgreSQL is ctid: stackoverflow.com/questions/14626481/rowid-equivalent-in-postgres-9-2
SQL Server has crazy "CTEs" change backing table extension: stackoverflow.com/questions/18390574/how-to-delete-duplicate-rows-in-sql-server

GROUP BY (SQL)

 0  0

HAVING (SQL)

 0  0

HAVING vs WHERE

 0  0

INSERT (SQL)

 0  0

Upsert (ON CONFLICT UPDATE)

 0  0

UPSERT is extremely handy, and reduces the number of find, check on server, update loops. But RETURNING is a fundamental part of that (to get the updated/existing) ID. Can't believe SQL hasn't standardized it yet as of 2022. But both SQLite and Postgres support it with similar syntax thankfully.

nodejs/sequelize/raw/upsert.js

Upsert with `NOT NULL` column

 0  0

Attempt at nodejs/sequelize/raw/upsert.js:

stackoverflow.com/questions/48816629/on-conflict-do-nothing-in-postgres-with-a-not-null-constraint OP unable to provide a minimal exampe, but it is likely the problem
dba.stackexchange.com/questions/292428/postgresql-upsert-issue-with-not-null-columns

Related on more complex constraints:

JOIN (SQL)

 0  0

SQL prefix column names with the table they came from

 0  0

It is mind blowing that this is not possible... the only way to avoid ambiguity in JOINs with column name conflicts is to give aliases to each column...

INNER JOIN

 0  0

OUTER JOIN

 0  0

LIKE (SQL)

 0  0

SELECT (SQL)

 0  0

SELECT FOR UPDATE

 0  0

An example where SELECT FOR UPDATE is a good solution to an use case can be seen at: nodejs/sequelize/raw/parallel_select_and_update.js.

SELECT FOR UPDATE vs/together with the SQL transaction isolation level is commented at: stackoverflow.com/questions/10935850/when-to-use-select-for-update.

SQL stored procedure

 0  0

SQL FUNCTION keyword

 0  0

SQL PROCEDURE

 0  0

SQL TRIGGER

 0  0

SQL's implementation of database triggers.

This feature is really cool, as it allows you to keep caches up to date!

In particular, everything that happens in a trigger happens as if it were in a transaction. This way, you can do less explicit transactions when you use triggers. It is a bit like the advantages of SQL CASCADE.

DBMS:

ORM:

Sequelize: SQL TRIGGER in Sequelize

ISO SQL TRIGGER syntax

 0  0

TODO what is the standard compliant syntax?

PostgreSQL requires you to define a SQL stored procedure: stackoverflow.com/questions/28149494/is-it-possible-to-create-trigger-without-execute-procedure-in-postgresql Their syntax may be standard compliant, not sure about the EXECUTE part. Their docs: www.postgresql.org/docs/current/sql-createtrigger.html

SQLite does not support SQL stored procedures at all, so maybe that's why they can't be standard compliant here: stackoverflow.com/questions/3335162/creating-stored-procedure-in-sqlite

SQL:1999 11.38 covers "Trigger definition". The Abstract syntax tree starts with the CREATE TRIGGER and ends in:

<triggered SQL statement> ::=
  <SQL procedure statement>

This is defined at 13.5 "SQL procedure statement", but that is humongous and I'm not sure what it is at all.

nodejs/sequelize/raw/trigger_count.js

 0  0

In this example we cache track the number of posts per user on a cache column.

UNION (SQL)

 0  0

Basic example tested on SQLite 3.40.1, Ubuntu 23.04:

sqlite3 :memory: 'select 1 union select 2'

output:

1
2

Two columns two rows:

sqlite3 :memory: <<EOF
select * from (values (1, 2), (2, 3))
union
select * from (values (2, 3), (3, 4))
EOF

output:

1|2
2|3
3|4

Note how duplicates are removed, to keep them we UNION ALL instead:

sqlite3 :memory: <<EOF
select * from (values (1, 2), (2, 3))
union all
select * from (values (2, 3), (3, 4))
EOF

output:

1|2
2|3
2|3
3|4

UPDATE (SQL)

 0  0

Update multiple rows with different values in a single SQL query

 0  0

This happens when you calculate a bunch of values on your program, and then want to save the to SQL.

Upsert is an option, but it fails if you have a NOT NULL column: Upsert with NOT NULL column

Bibliography:

UPDATE with JOIN (SQL)

 0  0

Dumping examples under nodejs/sequelize/raw/many_to_many.js.

Not possible without subqueries in the standard syntax, a huge shame: stackoverflow.com/questions/1293330/how-can-i-do-an-update-statement-with-join-in-sql-server

The UPDATE + FROM extension exists in a few DBMS s:

ORM:

Sequelize: UPDATE with JOIN in Sequelize

 Tagged

UPDATE with JOIN in Sequelize

DELETE with JOIN (SQL)

 0  0

Demo under: nodejs/sequelize/raw/many_to_many.js.

NO way in the SQL standard apparently, but you'd hope that implementation status would be similar to UPDATE with JOIN, but not even!

PostgreSQL: possible with DELETE FROM USING: stackoverflow.com/questions/11753904/postgresql-delete-with-inner-join
SQLite: not possible without subqueries as of 3.35 far: stackoverflow.com/questions/24511153/how-delete-table-inner-join-with-other-table-in-sqlite, Does not appear to have any relevant features at: www.sqlite.org/lang_delete.html

ORM

Sequelize: no support of course: stackoverflow.com/questions/40890131/sequelize-destroy-record-with-join

SQL standard

 0  0

A quick look at PostgreSQL's compliance notes: www.postgresql.org/docs/13/features.html shows the complete utter mess that this standard is. Multiple compliance levels that no one fully implements and optional features everywhere.

 Tagged

ISO SQL TRIGGER syntax

SQL standard version

 0  0

SQL:1999

 0  0

SQL application

 0  0

SQL histogram

 0  0

OK, there's a billion questions:

SQL Server
- stackoverflow.com/questions/485409/generating-a-histogram-from-column-values-in-a-database OP did not know the difference between count and histogram :-) But it's the number one Google result.
- stackoverflow.com/questions/19103991/create-range-bins-from-sql-server-table-for-histograms has a minor extra group by twist, but otherwise fine
- stackoverflow.com/questions/16268441/generate-histogram-in-sql-server
SQLite
- stackoverflow.com/questions/67514208/how-to-optimise-creating-histogram-bins-in-sqlite perf only, benchmarking would be needed. SQLite.
- stackoverflow.com/questions/32155449/create-a-histogram-with-a-dynamic-number-of-partitions-in-sqlite variable bin size, same number of entries per bin
- stackoverflow.com/questions/60348109/histogram-for-time-periods-using-sqlite-regular-buckets-1h-wide time
MySQL: stackoverflow.com/questions/1764881/getting-data-for-histogram-plot MySQL appears to extend ROUND to also round by integers: ROUND(numeric_value, -2), but this is not widely portable which is a shame
stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql specifically asks about empty bins, which is amazing. Amazon Redshift dialect unfortunately, but answer provided works widely, and Redshift was forked from PostgreSQL, so there's hope. Those newb open source server focused projects that don't use AGPL!

Let's try it on SQLite 3.40.1, Ubuntu 23.04. Data setup:

sqlite3 tmp.sqlite 'create table t(x integer)'
sqlite3 tmp.sqlite <<EOF
insert into t values (
  0,
  2,
  2,
  3,

  5,
  6,
  6,
  8,
  9,

  17,
)
EOF
sqlite3 tmp.sqlite 'create index tx on t(x)'

For a bin size of 5 ignoring empty ranges we can:

sqlite3 tmp.sqlite <<EOF
select floor(x/5)*5 as x,
       count(*) as cnt
from t
group by 1
order by 1
EOF

which produces the desired:

0|4
5|5
15|1

And to consider empty ranges we can use SQL genenerate_series + as per stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql:

sqlite3 tmp.sqlite <<EOF
select x, sum(cnt) from (
  select floor(x/5)*5 as x,
         count(*) as cnt
    from t
    group by 1
  union
  select *, 0 as cnt from generate_series(0, 15, 5)
)
group by x
EOF

which outputs the desired:

0|4
5|5
10|0
15|1

SQL 2D histogram

 0  0

Let's try it on SQLite 3.40.1, Ubuntu 23.04. Data setup:

sqlite3 tmp.sqlite 'create table t(x integer, y integer)'
sqlite3 tmp.sqlite <<EOF
insert into t values
  (0, 0),
  (1, 1),
  (2, 2),
  (3, 3),
  (4, 4),
  (5, 5),
  (6, 6),
  (7, 7),
  (8, 8),
  (9, 9),
  (10, 10),
  (11, 11),
  (12, 12),
  (13, 13),
  (14, 14),
  (15, 15),
  (16, 16),
  (17, 17),
  (18, 18),
  (19, 19),

  (2, 18)
EOF
sqlite3 tmp.sqlite 'create index txy on t(x, y)'

For a bin size of 5 ignoring empty ranges we can:

sqlite3 tmp.sqlite <<EOF
select
  floor(x/5)*5 as x,
  floor(y/5)*5 as y,
  count(*) as cnt
from t
group by 1, 2
order by 1, 2
EOF

which produces the desired:

0|0|5
0|15|1
5|5|5
10|10|5
15|15|5

And to consider empty ranges we can use SQL genenerate_series + as per stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql:

sqlite3 tmp.sqlite <<EOF
select x, y, sum(cnt) from (
  select
      floor(x/5)*5 as x,
      floor(y/5)*5 as y,
      count(*) as cnt
    from t
    group by 1, 2
  union
  select *, 0 as cnt from generate_series(0, 15, 5) inner join (select * from generate_series(0, 15, 5))
)
group by x, y
EOF

which outputs the desired:

0|0|5
0|5|0
0|10|0
0|15|1
5|0|0
5|5|5
5|10|0
5|15|0
10|0|0
10|5|0
10|10|5
10|15|0
15|0|0
15|5|0
15|10|0
15|15|5

SQL tree traversal

 0  0

Example: nodejs/sequelize/raw/tree.js

Closure table

 0  0

Nested set model in SQL (Nested set index)

 0  0

How to implement Nested set model in SQL:

stackoverflow.com/questions/192220/what-is-the-most-efficient-elegant-way-to-parse-a-flat-table-into-a-tree/42781302#42781302 contains the correct left/size representation and update queries, which makes it much easier to maintain the tree without having to worry about the sizes of siblings which are constant
stackoverflow.com/questions/192220/what-is-the-most-efficient-elegant-way-to-parse-a-flat-table-into-a-tree/194031#194031 amazing ASCII art representations of the structure. Unfortunately uses a wonky left/right representation, rather than the much more natural left/size representation from the other post

SQL feature

 0  0

Generated column

 0  0

SQL RECURSIVE query

 0  0

Minimal example: nodejs/sequelize/raw/recursive.js

More advanced SQL tree traversal examples: nodejs/sequelize/raw/tree.js

PostgreSQL docs: www.postgresql.org/docs/16/queries-with.html#QUERIES-WITH-RECURSIVE

SQL RECURSIVE prevent infinite recursion

 0  0

Example under: nodejs/sequelize/raw/tree.js

SQL spatial index

 0  0

PostgreSQL spatial index

 0  0

PostgreSQL GIST

 0  0

www.postgresql.org/docs/15/gist.html
www.postgresql.org/docs/15/datatype-geometric.html
medium.com/postgres-professional/indexes-in-postgresql-5-gist-86e19781b5db the only example on the net!

The highly underdocumented built-in module, that supports SQL spatial index and a lot more.

Quite horrendous as it only seems to work on geometric types and not existing columns. But why.

And it uses custom operatores, where standard operators would have been just fine for points...

Minimal runnable example with points:

set -x
time psql -c 'drop table if exists t'
time psql -c 'create table t(p point)'
time psql -c "insert into t select (point ('(' || generate_series || ',' || generate_series || ')')) from generate_series(1, 10000000)"
time psql -c 'create index on t using gist(p)'
time psql -c "select count(*) from t where p <@ box '(1000000,1000000),(9000000,2000000)'"

The index creation unfortunately took 100s, so it will not scale to 1B points very well whic his a shame.

Some sources about it:

stackoverflow.com/questions/28292198/how-to-port-simple-spatial-index-using-sqlite-r-trees-to-postgres

PostGIS

 0  0

postgis.net/

The third part module, which clutters up any serches you make for the built-in one.

SQL subquery (CTE)

 0  0

Common Table Expression (CTE)

 0  0

rm -f tmp.sqlite
sqlite3 tmp.sqlite 'create table t(i integer)'
sqlite3 tmp.sqlite 'insert into t values (1), (2)'
sqlite3 tmp.sqlite 'with mycte as ( select * from t ) delete from mycte where i = 1'
sqlite3 tmp.sqlite 'select * from t'

CTE insert values

 0  0

Useful for testing: stackoverflow.com/questions/21819183/how-to-use-ctes-with-update-delete-on-sqlite

sqlite3 :memory: 'WITH t (i, j) AS (VALUES (1, -1), (2, -2)) SELECT * FROM t'

SQL transaction

 0  0

SQL transaction isolation level

 0  0

Each transaction isolation level specifies what can or cannot happen when two queries are being run in parallel, i.e.: the memory semantics of the system.

Remember that queries can affects thousands of rows, and database systems like PostgreSQL can run multiple such queries at the same time.

Good summary on the PostgreSQL page: www.postgresql.org/docs/14/transaction-iso.html

Implementation specifics:

SQLite isolation levels

 Tagged

SQLite isolation levels

nodejs/sequelize/raw/parallel_create_delete_empty_tag.js is an example which experimentally seems to be solved by REAPEATABLE READ, although we are not sure that this is truly the case and why. What is clear is that that example is not solved by the SQL READ COMMITTED isolation level.

In PostgreSQL, this is the first isolation level which can lead to postgreSQL serialization failures, this does not happen to SQL READ COMMITTED isolation level in that DBMS. You then have to retry the transaction.

 Articles were limited to the first 100 out of 111 total. Click here to view all children of Type of database.

 Discussion (0)  Subscribe (1)

 Discussion (0)