PostgreSQL feels good.
Had a look at the source tree, and also felt good.
If Oracle is the Microsoft of database, Postgres is the Linux, and MySQL (or more precisely MariaDB) is the FreeBSD (i.e. the one that got delayed by legal issues). Except that their software licenses were accidentally swapped.
The only problem with Postgres is its name. PostgreSQL is so unpronounceable and so untypeable that you should just call it "Postgres" like everyone else.
On Ubuntu 20.10 PostgreSQL 12.6, login with psql on my default username without sudo fails with: stackoverflow.com/questions/11919391/postgresql-error-fatal-role-username-does-not-exist
This is the one that worked on Ubuntu 21.04: stackoverflow.com/questions/11919391/postgresql-error-fatal-role-username-does-not-exist/38444152#38444152
sudo -u postgres createuser -s $(whoami)
createdb $(whoami)
Explanation:
  • sudo -u postgres uses the postgres user via peer authentication
  • -s in createuser -s: make it a superuser
  • createdb: TODO why do we have to create a table with the same name as the user? Otherwise login fails.
You can now run psql without any password. This works without password due to peer authentication,
sudo cat /etc/postgresql/12/main/pg_hba.conf
shows that peer authentication is available to all users apparently:
local   all             postgres                                peer

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     peer
List users:
psql -c '\du'
output:
                                    List of roles
  Role name  |                         Attributes                         | Member of 
-------------+------------------------------------------------------------+-----------
 ciro        | Superuser, Create role, Create DB                          | {}
 owning_user |                                                            | {}
 postgres    | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
Delete user later on:
psql -c 'DROP USER username;'
Create a database:
createdb testdb0
Help toplevel:
help
Get help for Postgres commands such as \h and so on:
\?
List supported SQL commands:
\h
Show syntax for one type of command:
\h SELECT
List all databases:
psql -c '\l'
which shows:
    Name     |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-------------+----------+----------+-------------+-------------+-----------------------
 ciro        | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 |
 postgres    | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 |
 template0   | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =c/postgres          +
             |          |          |             |             | postgres=CTc/postgres
 template1   | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =c/postgres          +
             |          |          |             |             | postgres=CTc/postgres
 testdb0     | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 |
(6 rows)
Delete a database:
psql -c 'DROP DATABASE "testdb0";'
If you didn't give a database from the command line e.g.:
psql
you can do that afterwards with:
\c testdb0
Let's create a table and test that it is working:
psql testdb0 -c 'CREATE TABLE table0 (int0 INT, char0 CHAR(16));'
List tables, no special tables:
psql testdb0 -c '\dt'
gives:
        List of relations
 Schema |  Name  | Type  | Owner
--------+--------+-------+-------
 public | table0 | table | ciro
(1 row)
View table schema: stackoverflow.com/questions/109325/postgresql-describe-table
psql testdb0 -c '\d+ table0'
output:
                                      Table "public.table0"
 Column |     Type      | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------------+-----------+----------+---------+----------+--------------+-------------
 int0   | integer       |           |          |         | plain    |              | 
 char0  | character(16) |           |          |         | extended |              | 
Insert some data into it and get the data out:
psql testdb0 -c "INSERT INTO table0 (int0, char0) VALUES (2, 'two'), (3, 'three'), (5, 'five'), (7, 'seven');"
psql testdb0 -c 'SELECT * FROM table0;'
output:
 int0 |      char0
------+------------------
    2 | two
    3 | three
    5 | five
    7 | seven
(4 rows)
Delete the table:
psql testdb0 -c 'DROP TABLE table0;'
This one is good: stackoverflow.com/questions/36533429/generate-random-string-in-postgresql/44200391#44200391 as it also describes how to generate multiple values.
with symbols(characters) as (VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'))
select string_agg(substr(characters, (random() * (length(characters) - 1) + 1)::INTEGER, 1), '')
from symbols
join generate_series(1,8) as word(chr_idx) on 1 = 1 -- word length
join generate_series(1,10000) as words(idx) on 1 = 1 -- # of words
group by idx;
Then you can insert it into a row with:
create table tmp(s text);
insert into tmp(s)
  select s from
  (
    with symbols(characters) as (VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'))
    select string_agg(substr(characters, (random() * (length(characters) - 1) + 1)::INTEGER, 1), '') as asdf
    from symbols
    join generate_series(1,8) as word(chr_idx) on 1 = 1 -- word length
    join generate_series(1,10000) as words(idx) on 1 = 1 -- # of words
    group by idx
  ) as sub(s);
A more convenient approach is likely to define the function:
CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$
select
  string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word
from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789    --')) as symbols(characters)
  join generate_series(1, $1) on 1 = 1
$$ language sql;  
And then:
create table tmp(s text, t text);
insert into tmp(s) select random_string(10) from generate_series(10);
This section was tested on Ubuntu 24.10, PostgreSQL 16.6.
Let's create some test data like this:
time psql tmp -c 'DROP TABLE IF EXISTS fts;'
time psql tmp -c 'CREATE TABLE fts(s TEXT, i INTEGER);'
time psql tmp <<'EOF'
INSERT INTO fts SELECT
  i::text || ' ' ||
    (i * 2  )::text || ' ' ||
    (i * 5  )::text || ' ' ||
    (i * 7  )::text || ' ' ||
    (i * 11 )::text || ' ' ||
    (i * 13 )::text || ' ' ||
    (i * 17 )::text || ' ' ||
    (i * 23 )::text || ' ' ||
    (i * 29 )::text || ' ' ||
    (i * 31 )::text
  ,
  i % 100
FROM generate_series(1::bigint, 100000000::bigint) AS s(i);
EOF
The creation time was 2m13s, and the final size was:
     table_name    | pg_size_pretty | pg_total_relation_size 
------------------+----------------+------------------------
 fts              | 13 GB          |            14067326976
This test data will be simple to predict what each line contains so we can make educated queries, while also posing some difficulty to the RDMS. As per:
time psql tmp -c 'SELECT * FROM fts LIMIT 10;'
the first columns look like:
                  s                  | i  
-------------------------------------+----
 1 2 5 7 11 13 17 23 29 31           |  1
 2 4 10 14 22 26 34 46 58 62         |  2
 3 6 15 21 33 39 51 69 87 93         |  3
 4 8 20 28 44 52 68 92 116 124       |  4
 5 10 25 35 55 65 85 115 145 155     |  5
 6 12 30 42 66 78 102 138 174 186    |  6
 7 14 35 49 77 91 119 161 203 217    |  7
 8 16 40 56 88 104 136 184 232 248   |  8
 9 18 45 63 99 117 153 207 261 279   |  9
 10 20 50 70 110 130 170 230 290 310 | 10
We aimed to create a test table of size around 10 GB, as in practice it is around that order of size that index speedups start to become very obvious on a SSD-based system.
Before we create the index, let's see if our non-indexed queries are slow enough for our tests:
time psql tmp -c "SELECT * FROM fts WHERE s LIKE '% 50000000 %';"
which gives:
                                                 s                                                 | i 
---------------------------------------------------------------------------------------------------+---
 10000000 20000000 50000000 70000000 110000000 130000000 170000000 230000000 290000000 310000000   | 0
 25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000 | 0
(2 rows)


real    0m11.758s
user    0m0.017s
sys     0m0.008s
so it should be enough to observe the index speedup.
Now let's create the index. First we create a generated column that splits the strings with to_tsvector, and then we index that split column:
time psql tmp <<'EOF'
ALTER TABLE fts ADD COLUMN s_ts tsvector
  GENERATED ALWAYS AS (to_tsvector('english', s)) STORED;
EOF
time psql tmp -c 'CREATE INDEX s_ts_gin_idx ON fts USING GIN (s_ts);'
These commands took 8m51s and 40m8s and the DB size went up about 5x:
    table_name    | pg_size_pretty | pg_total_relation_size 
------------------+----------------+------------------------
 fts              | 69 GB          |            74487758848
And finally let's try out the index:
time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000');"
which "instantly" gives us in 0m0.129s:
                                                   s                                                   | i 
-------------------------------------------------------------------------------------------------------+---
 10000000 20000000 50000000 70000000 110000000 130000000 170000000 230000000 290000000 310000000       | 0
 25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000     | 0
 50000000 100000000 250000000 350000000 550000000 650000000 850000000 1150000000 1450000000 1550000000 | 0
so the index worked!
We understand from this that it only find exact word hits.
Another important use case is to search for prefixes of words, e.g. as you'd want in a simple autocompletion system. This can be achieved by adding :* at the end of the search term as in:
time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000:*');"
This finishes in the same amount of time, and gives:
                                                     s                                                     | i  
-----------------------------------------------------------------------------------------------------------+----
 10000000 20000000 50000000 70000000 110000000 130000000 170000000 230000000 290000000 310000000           |  0
 38461539 76923078 192307695 269230773 423076929 500000007 653846163 884615397 1115384631 1192307709       | 39
 45454546 90909092 227272730 318181822 500000006 590909098 772727282 1045454558 1318181834 1409090926      | 46
 50000000 100000000 250000000 350000000 550000000 650000000 850000000 1150000000 1450000000 1550000000     |  0
 71428572 142857144 357142860 500000004 785714292 928571436 1214285724 1642857156 2071428588 2214285732    | 72
 100000000 200000000 500000000 700000000 1100000000 1300000000 1700000000 2300000000 2900000000 3100000000 |  0
 29411765 58823530 147058825 205882355 323529415 382352945 500000005 676470595 852941185 911764715         | 65
 25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000         |  0
so now we have cool hits such as 500000000, 500000004, 500000005, 500000007 and 500000006. The syntax is also mentioned at:
Next we can also try some other queries with multiple terms. Text must contain two words with &:
time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000 & 175000000');"
gives:
                                                   s                                                   | i 
-------------------------------------------------------------------------------------------------------+---
 25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000     | 0
Text can contain either word with |:
time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000 | 175000000');"
gives:
                                                    s                                                    | i 
---------------------------------------------------------------------------------------------------------+---
 10000000 20000000 50000000 70000000 110000000 130000000 170000000 230000000 290000000 310000000         | 0
 50000000 100000000 250000000 350000000 550000000 650000000 850000000 1150000000 1450000000 1550000000   | 0
 87500000 175000000 437500000 612500000 962500000 1137500000 1487500000 2012500000 2537500000 2712500000 | 0
 25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000       | 0
 35000000 70000000 175000000 245000000 385000000 455000000 595000000 805000000 1015000000 1085000000     | 0
Text can contain the given words sequentially:
time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000 <-> 125000000 <-> 175000000');"
gives:
                                                   s                                                   | i 
-------------------------------------------------------------------------------------------------------+---
 25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000     | 0
We can also inspect how words were split by simply doing a SELECT * again:
             s              | i |                                 s_ts                                 
----------------------------+---+----------------------------------------------------------------------
1 2 5 7 11 13 17 23 29 31   | 1 | '1':1 '11':5 '13':6 '17':7 '2':2 '23':8 '29':9 '31':10 '5':3 '7':4
2 4 10 14 22 26 34 46 58 62 | 2 | '10':3 '14':4 '2':1 '22':5 '26':6 '34':7 '4':2 '46':8 '58':9 '62':10
3 6 15 21 33 39 51 69 87 93 | 3 | '15':3 '21':4 '3':1 '33':5 '39':6 '51':7 '6':2 '69':8 '87':9 '93':10
Let's check if the index updates automatically when we do an insert and if insertion seems to have been significantly slowed down by the index:
time psql tmp -c "INSERT INTO fts VALUES ('abcd efgh', 99)"
finishes in:
real    0m0.043s
user    0m0.014s
sys     0m0.010s
so performance is OK. Presumably, the insertion time is proportional to the number of tokens, doing one logarithmic operation per token, so indexing short chunks of text like titles is easy. And then let's find it:
time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', 'efgh');"
which finds it with:
     s     | i  
-----------+----
 abcd efgh | 99
so we are all good. Unfortunately, accurate performance benchmarking is a bit harder than that, as the index by default first collects a certain number of updates into memory into the "pending list", before actually inserting them all at once after a certain mass is reached, as documented at: www.postgresql.org/docs/17/gin.html#GIN-IMPLEMENTATION
The last thing that we need to understand is how to_tsvector tokenizes strings for the english language. For example running:
psql -c "select to_tsvector('english', 'A Dog run runs fast faster two Cats: b c to from 1 é befhyph-afthyph.')"
gives:
'1':13
'afthyph':17
'b':9
'befhyph':16
'befhyph-afthyph':15
'c':10
'cat':8
'dog':2
'fast':5
'faster':6
'run':3,4
'two':7
'é':14
so we understand some of the heuristic normalizations:
The full list of languages available can be obtained with:
psql -c '\dF'
On Ubuntu 24.10, the list contains major world languages, plus the special simple configuration such that:
psql -c "select to_tsvector('simple', 'A Dog run runs fast faster two Cats: b c to from 1 é befhyph-afthyph.')"
gives:
'1':13
'a':1
'afthyph':17
'b':9
'befhyph':16
'befhyph-afthyph':15
'c':10
'cats':8
'dog':2
'fast':5
'faster':6
'from':12
'run':3
'runs':4
'to':11
'two':7
'é':14
so we understand that it is similar to english but it does not:
  • seem to have any stopwords
  • does not do singularization normalization
Also posted at:
In order to create a test user with password instead of peer authentication, let's create test user:
createuser -P user0
createdb user0
-P makes it prompt for the users password.
Alternatively, to create the password non-interactively stackoverflow.com/questions/42419559/postgres-createuser-with-password-from-terminal:
psql -c "create role NewRole with login password 'secret'"
Can't find a way using the createuser helper.
We can then login with that password with:
psql -U user0 -h localhost
which asks for the password we've just set, because the -h option turns off peer authentication, and turns off password authentication.
The password can be given non-interactively as shown at stackoverflow.com/questions/6405127/how-do-i-specify-a-password-to-psql-non-interactively with the PGPASSWORD environment variable:
PGPASSWORD=a psql -U user0 -h localhost
Now let's create a test database which user0 can access with an existing superuser account:
createdb user0db0
psql -c 'GRANT ALL PRIVILEGES ON DATABASE user0db0 TO user0'
We can check this permission with:
psql -c '\l'
which now contains:
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 user0db0  | ciro     | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =Tc/ciro             +
           |          |          |             |             | ciro=CTc/ciro        +
           |          |          |             |             | user0=CTc/ciro
The permission letters are explained at:
user0 can now do the usual table operations on that table:
PGPASSWORD=a psql -U user0 -h localhost user0db0 -c 'CREATE TABLE table0 (int0 INT, char0 CHAR(16));'
PGPASSWORD=a psql -U user0 -h localhost user0db0 -c "INSERT INTO table0 (int0, char0) VALUES (2, 'two'), (3, 'three'), (5, 'five'), (7, 'seven');"
PGPASSWORD=a psql -U user0 -h localhost user0db0 -c 'SELECT * FROM table0;'
Uses the name of the current Linux user to login without a password.
Ubuntu 21.10 has a certain default level of logging by default to:
/var/log/postgresql/postgresql-13-main.log
but it does not log everything, only/mostly errors it seems.
Setting:
log_statement = 'all'
under:
/etc/postgresql/13/main/postgresql.conf
and then restarting the server:
sudo service restart postgresql
just works.
When using SQL REPEATABLE READ isolation level and SQL SERIALIZABLE isolation level, concurrent transactions may fail with a serialization failure, and then you might need to retry them. You server code or your ORM must always account for that.
A good way to explore when it happens is to use the example
Pattern you always want to generate Generate random text in PostgreSQL:
CREATE TABLE "mytable" ("i" INTEGER, "j" INTEGER);
INSERT INTO "mytable" SELECT i, i*2 FROM generate_series(1, 10) as s(i);

Articles by others on the same topic (0)

There are currently no matching articles.