One "LevelDB" database contains multiple file in a directory. Off the bat inferior to SQLite which stores everything in a single file!
github.com/mdawsonuk/LevelDBDumper/tree/e750a27ff58443ecc410b5c16abbdc539d617387#installation worked on Ubuntu 23.10 Annoying installation, but worked: github.com/mdawsonuk/LevelDBDumper/issues/13
Initial issues off-the-bat:
List databases:
echo 'show dbs' | mongo
Delete database:or:
use mydb
db.dropDatabase()
echo 'db.dropDatabase()' | mongo mydb
View collections within a database:
echo 'db.getCollectionNames()' | mongo mydb
Show all data from one of the collections: stackoverflow.com/questions/24985684/mongodb-show-all-contents-from-all-collections
echo 'db.collectionName.find()' | mongo mydb
Tested as of Ubuntu 20.04, there is no Mongo package available by default due to their change to Server Side Public License, which Debian opposed. Therefore, you have to add their custom PPA as mentioned at: docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/
These examples are written in the Sequelize library using raw queries.
Sequelize is used minimally, just to feed raw queries in transparently to any underlying database, and get minimally parsed results out for us, which we then assert with standard JavaScript. The queries themselves are all written by hand.
By default the examples run on SQLite. Just like the examples from sequelize example, you can set the database at runtime as:
./index.js
or./index.js l
: SQLite./index.js p
: PostgreSQL. You must manually create a database calledtmp
and ensure that peer authentication works for it
Here we list only examples which we believe are standard SQL, and should therefore work across different SQL implementations:
- nodejs/sequelize/raw/index.js: basic hello world to demonstrate the setup and very simple functionality
- nodejs/sequelize/raw/many_to_many.js: illustrates many-to-many relations with JOIN. Contains:
- SQL transaction examples:
- nodejs/sequelize/raw/commit_error.js: stackoverflow.com/questions/27245101/why-should-we-use-rollback-in-sql-explicitly/27245234#27245234 and stackoverflow.com/questions/48277519/how-to-use-commit-and-rollback-in-a-postgresql-function/48277708#48277708 suggest that on PostgreSQL, once something fails inside a transaction, all queries in the current transaction are ignored, and
COMMIT
simply does aROLLBACK
. Let's check. Yup, true for Postgres, but false for SQLite, SQLite just happily runs anything it can, you really needROLLBACK
for it. - SQL isolation level example
- nodejs/sequelize/raw/commit_error.js: stackoverflow.com/questions/27245101/why-should-we-use-rollback-in-sql-explicitly/27245234#27245234 and stackoverflow.com/questions/48277519/how-to-use-commit-and-rollback-in-a-postgresql-function/48277708#48277708 suggest that on PostgreSQL, once something fails inside a transaction, all queries in the current transaction are ignored, and
- GROUP BY and SQL aggregate functions:
- nodejs/sequelize/raw/group_by_extra_column.js: let's see if it blows up or not on different DB systems,
sqlite3
Node.js package allows it:- github.com/sequelize/sequelize/issues/5481#issuecomment-964387232
- dba.stackexchange.com/questions/141594/how-select-column-does-not-list-in-group-by-clause/141600 says that it was allowed in SQL:1999 when there are no ambiguities due to constraints, e.g. when grouping by unique columns
- github.com/postgres/postgres/blob/REL_13_5/src/test/regress/sql/functional_deps.sql#L27 shows that PostgreSQL wants it to work for
UNIQUE NOT NULL
, but they just haven't implemented it as of 13.5, where it only works if you group byPRIMARY KEY
- dba.stackexchange.com/questions/158015/why-can-i-select-all-fields-when-grouping-by-primary-key-but-not-when-grouping-b also says that
UNIQUE NOT NULL
doesn't work. Dan Lenski then points to rationale mailing list thread:
- nodejs/sequelize/raw/group_by_max_full_row.js: here we try to get the full row of each group at which a given column reaches the max of the group
- Postgres: has
SELECT DISCINTCT ON
which works perfectly if you only want one row in case of multiple rows attaining the max.ON
is an extension to the standard unfortunately: www.postgresql.org/docs/9.3/sql-select.html#SQL-DISTINCT Docs specify that it always respectsORDER BY
when selecting the row.- stackoverflow.com/questions/586781/postgresql-fetch-the-row-which-has-the-max-value-for-a-column asks it without the multiple matches use case
- stackoverflow.com/questions/586781/postgresql-fetch-the-rows-which-have-the-max-value-for-a-column-in-each-group/587209#587209 also present in simpler form at stackoverflow.com/questions/121387/fetch-the-rows-which-have-the-max-value-for-a-column-for-each-distinct-value-of/123481#123481 gives a very nice OUTER JOIN only solution! Incredible, very elegant.
- dba.stackexchange.com/questions/171938/get-only-rows-with-max-group-value asks specifically the case of multiple matches to the max
- stackoverflow.com/questions/586781/postgresql-fetch-the-row-which-has-the-max-value-for-a-column asks it without the multiple matches use case
- SQLite:
- stackoverflow.com/questions/48326957/row-with-max-value-per-group-sqlite
- stackoverflow.com/questions/48326957/row-with-max-value-per-group-sqlite/48328243#48328243 teaches us that in SQLite min and max are magic and guarantee that the matching row is returned
- stackoverflow.com/questions/48326957/row-with-max-value-per-group-sqlite/72996649#72996649 Ciro Santilli uses the magic of
ROW_NUMBER
- stackoverflow.com/questions/17277152/sqlite-select-distinct-of-one-column-and-get-the-others/71924314#71924314 get any full row without specifying which, we teach how to specify
- code.djangoproject.com/ticket/22696 WONTFIXed
DISTINCT ON
- stackoverflow.com/questions/50846722/what-is-the-difference-between-postgres-distinct-vs-distinct-on/72997494#72997494
DISTINCT
vsDISTINCT ON
, somewhat related question
- stackoverflow.com/questions/50846722/what-is-the-difference-between-postgres-distinct-vs-distinct-on/72997494#72997494
- stackoverflow.com/questions/48326957/row-with-max-value-per-group-sqlite
- stackoverflow.com/questions/5803032/group-by-to-return-entire-row asks how to take the top N with distinct after order limit. I don't know how to do it in Postgres
- Postgres: has
- nodejs/sequelize/raw/most_frequent.js: illustrates a few variants of findind the mode, including across GROUP
- nodejs/sequelize/raw/group_by_max_n.js: get the top N in each group
- nodejs/sequelize/raw/group_by_extra_column.js: let's see if it blows up or not on different DB systems,
- order results in the same order as
IN
: - LIMIT by a running total: TODO links
Login without password: askubuntu.com/questions/915585/how-to-login-mysql-shell-when-mysql-have-no-passwordworks on Ubuntu 20.10.
sudo mysql
Create user for further logins without
sudo
askubuntu.com/questions/915585/how-to-login-mysql-shell-when-mysql-have-no-password/1325689#1325689:sudo mysql -e "CREATE USER $USER"
Run command from CLI stackoverflow.com/questions/1602904/how-do-you-run-a-single-query-through-mysql-from-the-command-line
mysql -e 'SHOW DATABASES'
Create test user with password:and login as that user:Login with password given on the command line:The
sudo mysql -e 'CREATE USER user0 IDENTIFIED WITH mysql_native_password BY "a"'
sudo mysql -e 'GRANT ALL PRIVILEGES ON database_name.* TO "user0"'
mysql -u user0 -p
mysql -u user0 -pmypassword
IDENTIFIED WITH mysql_native_password
part is to overcome "Client does not support authentication protocol requested by server" when connecting from Node.js.List users:
sudo mysql -e 'SELECT * FROM mysql.user'
View permissions for each user on each DB: serverfault.com/questions/263868/how-to-know-all-the-users-that-can-access-a-database-mysql
sudo mysql -e 'SELECT * FROM mysql.db'
List databases:
sudo mysql -e 'SHOW DATABASES'
Create database:
sudo mysql -e 'CREATE DATABASE mydb0'
Destroy database:
sudo mysql -e 'DROP DATABASE mydb0'
Show tables in database:or:
sudo mysql -e 'SHOW TABLES' mydb0
sudo mysql -e 'SHOW TABLES FROM mydb0'
Dude's a legend. Sells company for a few million. Then forks the open source project next year. Love it.
PostgreSQL feels good.
Its feature set is insanely large! Just look at stuff like: stackoverflow.com/questions/1986491/sql-split-string-by-space-into-table-in-postgresql/1993058#1993058
Had a look at the source tree, and also felt good.
If Oracle is the Microsoft of database, Postgres is the Linux, and MySQL (or more precisely MariaDB) is the FreeBSD (i.e. the one that got delayed by legal issues). Except that their software licenses were accidentally swapped.
The only problem with Postgres is its name. PostgreSQL is so unpronounceable and so untypeable that you should just call it "Postgres" like everyone else.
On Ubuntu 20.10 PostgreSQL 12.6, login with
psql
on my default username without sudo fails with: stackoverflow.com/questions/11919391/postgresql-error-fatal-role-username-does-not-existThis is the one that worked on Ubuntu 21.04: stackoverflow.com/questions/11919391/postgresql-error-fatal-role-username-does-not-exist/38444152#38444152Explanation:
sudo -u postgres createuser -s $(whoami)
createdb $(whoami)
sudo -u postgres
uses thepostgres
user via peer authentication-s
increateuser -s
: make it a superusercreatedb
: TODO why do we have to create a table with the same name as the user? Otherwise login fails.
You can now run shows that peer authentication is available to all users apparently:
psql
without any password. This works without password due to peer authentication, sudo cat /etc/postgresql/12/main/pg_hba.conf
local all postgres peer
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all peer
List users:output:
psql -c '\du'
List of roles
Role name | Attributes | Member of
-------------+------------------------------------------------------------+-----------
ciro | Superuser, Create role, Create DB | {}
owning_user | | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
Delete user later on:
psql -c 'DROP USER username;'
Create a database:
createdb testdb0
Help toplevel:
help
Get help for Postgres commands such as
\h
and so on:\?
List supported SQL commands:
\h
Show syntax for one type of command:
\h SELECT
List all databases:which shows:
psql -c '\l'
Name | Owner | Encoding | Collate | Ctype | Access privileges
-------------+----------+----------+-------------+-------------+-----------------------
ciro | postgres | UTF8 | en_GB.UTF-8 | en_GB.UTF-8 |
postgres | postgres | UTF8 | en_GB.UTF-8 | en_GB.UTF-8 |
template0 | postgres | UTF8 | en_GB.UTF-8 | en_GB.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_GB.UTF-8 | en_GB.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
testdb0 | postgres | UTF8 | en_GB.UTF-8 | en_GB.UTF-8 |
(6 rows)
Delete a database:
psql -c 'DROP DATABASE "testdb0";'
If you didn't give a database from the command line e.g.:you can do that afterwards with:
psql
\c testdb0
Let's create a table and test that it is working:
psql testdb0 -c 'CREATE TABLE table0 (int0 INT, char0 CHAR(16));'
List tables, no special tables:gives:
psql testdb0 -c '\dt'
List of relations
Schema | Name | Type | Owner
--------+--------+-------+-------
public | table0 | table | ciro
(1 row)
View table schema: stackoverflow.com/questions/109325/postgresql-describe-tableoutput:
psql testdb0 -c '\d+ table0'
Table "public.table0"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------------+-----------+----------+---------+----------+--------------+-------------
int0 | integer | | | | plain | |
char0 | character(16) | | | | extended | |
Insert some data into it and get the data out:output:
psql testdb0 -c "INSERT INTO table0 (int0, char0) VALUES (2, 'two'), (3, 'three'), (5, 'five'), (7, 'seven');"
psql testdb0 -c 'SELECT * FROM table0;'
int0 | char0
------+------------------
2 | two
3 | three
5 | five
7 | seven
(4 rows)
Delete the table:
psql testdb0 -c 'DROP TABLE table0;'
- output one column per line: stackoverflow.com/questions/9604723/alternate-output-format-for-psql-showing-one-column-per-line-with-column-name
- PostgreSQL does not automatically index foreign keys! stackoverflow.com/questions/970562/postgres-and-indexes-on-foreign-keys-and-primary-keys
This one is good: stackoverflow.com/questions/36533429/generate-random-string-in-postgresql/44200391#44200391 as it also describes how to generate multiple values.
with symbols(characters) as (VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'))
select string_agg(substr(characters, (random() * (length(characters) - 1) + 1)::INTEGER, 1), '')
from symbols
join generate_series(1,8) as word(chr_idx) on 1 = 1 -- word length
join generate_series(1,10000) as words(idx) on 1 = 1 -- # of words
group by idx;
Then you can insert it into a row with:
create table tmp(s text);
insert into tmp(s)
select s from
(
with symbols(characters) as (VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'))
select string_agg(substr(characters, (random() * (length(characters) - 1) + 1)::INTEGER, 1), '') as asdf
from symbols
join generate_series(1,8) as word(chr_idx) on 1 = 1 -- word length
join generate_series(1,10000) as words(idx) on 1 = 1 -- # of words
group by idx
) as sub(s);
A more convenient approach is likely to define the function:
CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$
select
string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word
from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 --')) as symbols(characters)
join generate_series(1, $1) on 1 = 1
$$ language sql;
And then:
create table tmp(s text, t text);
insert into tmp(s) select random_string(10) from generate_series(10);
In order to create a test user with password instead of peer authentication, let's create test user:
createuser -P user0
createdb user0
-P
makes it prompt for the users password.Alternatively, to create the password non-interactively stackoverflow.com/questions/42419559/postgres-createuser-with-password-from-terminal:Can't find a way using the
psql -c "create role NewRole with login password 'secret'"
createuser
helper.We can then login with that password with:which asks for the password we've just set, because the
psql -U user0 -h localhost
-h
option turns off peer authentication, and turns off password authentication.The password can be given non-interactively as shown at stackoverflow.com/questions/6405127/how-do-i-specify-a-password-to-psql-non-interactively with the
PGPASSWORD
environment variable:PGPASSWORD=a psql -U user0 -h localhost
Now let's create a test database which
user0
can access with an existing superuser account:createdb user0db0
psql -c 'GRANT ALL PRIVILEGES ON DATABASE user0db0 TO user0'
We can check this permission with:which now contains:The permission letters are explained at:
psql -c '\l'
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
user0db0 | ciro | UTF8 | en_GB.UTF-8 | en_GB.UTF-8 | =Tc/ciro +
| | | | | ciro=CTc/ciro +
| | | | | user0=CTc/ciro
user0
can now do the usual table operations on that table:PGPASSWORD=a psql -U user0 -h localhost user0db0 -c 'CREATE TABLE table0 (int0 INT, char0 CHAR(16));'
PGPASSWORD=a psql -U user0 -h localhost user0db0 -c "INSERT INTO table0 (int0, char0) VALUES (2, 'two'), (3, 'three'), (5, 'five'), (7, 'seven');"
PGPASSWORD=a psql -U user0 -h localhost user0db0 -c 'SELECT * FROM table0;'
Ubuntu 21.10 has a certain default level of logging by default to:but it does not log everything, only/mostly errors it seems.
/var/log/postgresql/postgresql-13-main.log
Setting:under:and then restarting the server:just works.
log_statement = 'all'
/etc/postgresql/13/main/postgresql.conf
sudo service restart postgresql
Realtime monitoring for long queries instead: stackoverflow.com/questions/8597516/app-to-monitor-postgresql-queries-in-real-time
When using SQL REPEATABLE READ isolation level and SQL SERIALIZABLE isolation level, concurrent transactions may fail with a serialization failure, and then you might need to retry them. You server code or your ORM must always account for that.
A good way to explore when it happens is to use the example
Related questions:
- stackoverflow.com/questions/7705273/what-are-the-conditions-for-encountering-a-serialization-failure
- stackoverflow.com/questions/59351109/error-could-not-serialize-access-due-to-concurrent-update
- stackoverflow.com/questions/50797097/postgres-could-not-serialize-access-due-to-concurrent-update/51932824
Often known simply as SQL Server, a terrible thing that makes it impossible to find portable SQL answers on Google! You just have to Google by specific SQL implementation unfortunately to find anything about the open source ones.
The minimalism, serverlessness/lack of temporary caches/lack of permission management, Hipp's religious obsession with efficiency, the use of their own pure Fossil version control[ref]. Wait, scrap that last one. Pure beauty!
Official Git mirror: github.com/sqlite/sqlite
Create a table
sqlite3 db.sqlite3 "
CREATE TABLE 'IntegerNames' (int0 INT, char0 CHAR(16));
INSERT INTO 'IntegerNames' (int0, char0) VALUES (2, 'two'), (3, 'three'), (5, 'five'), (7, 'seven');
"
List tables:output:
sqlite3 db.sqlite3 '.tables'
IntegerNames
Show schema of a table:outputs the query that would generate that table:
sqlite3 db.sqlite3 '.schema IntegerNames'
CREATE TABLE IF NOT EXISTS 'IntegerNames' (int0 INT, char0 CHAR(16));
Show all data in a table:output:
sqlite3 db.sqlite3 'SELECT * FROM IntegerNames'
2|two
3|three
5|five
7|seven
Python sequence test data generation: stackoverflow.com/questions/18219779/bulk-insert-huge-data-into-sqlite-using-python/76659706#76659706
Example: sqlite/ip.c, adapted from www.sqlite.org/loadext.html, also mentioned explained at: stackoverflow.com/questions/7638238/sqlite-ip-address-storage/76520885#76520885.
Sample usage in the test program: sqlite/test.sh.
____
The default isolation level for SQLite is SERIALIZABLE
____
The default isolation level for SQLite is SERIALIZABLE
____
It does not appear possible to achieve the other two levels besides SERIALIZABLE and READ UNCOMMITED
Includes its own copy of sqlite3, you don't use the system one, which is good to ensure compatibility. The version is shown at: github.com/mapbox/node-sqlite3/blob/918052b538b0effe6c4a44c74a16b2749c08a0d2/deps/common-sqlite.gypi#L3 SQLite source is tracked compressed in-tree: github.com/mapbox/node-sqlite3/blob/918052b538b0effe6c4a44c74a16b2749c08a0d2/deps/sqlite-autoconf-3360000.tar.gz horrendous. This explains why it takes forever to clone that repository. People who don't believe in git submodules, there's even an official Git mirror at: github.com/sqlite/sqlite
It appears to spawn its own threads via its C extension (since JavaScript is single threaded and and SQLite is not server-based), which allows for parallel queries using multiple threads: github.com/mapbox/node-sqlite3/blob/v5.0.2/src/threading.h
Hello world example: nodejs/node-sqlite3/index.js.
As of 2021, this had slumped back a bit, as maintainers got tired. Unmerged pull requests started piling more, and
better-sqlite3
Node.js package started pulling ahead a little.- github.com/mapbox/node-sqlite3/issues/1381
FATAL ERROR: Error::ThrowAsJavaScriptException napi_throw
with Node.jsworker_threads
vsbetter-sqlite3
Node.js package github.com/JoshuaWise/better-sqlite3/issues/237
As claimed on their README, their operation truly appears to be 10x faster than the node-sqlite package!! It is insane!! How can that other package still exist at all?
The only big problem was the lack of ORM, but people are looking into that by adding it to Sequelize:
Have a look at some interesting examples under nodejs/sequelize/raw/many_to_many.js.
- SQLite with
rowid
: stackoverflow.com/questions/8190541/deleting-duplicate-rows-from-sqlite-database - SQL Server has crazy "CTEs" change backing table extension: stackoverflow.com/questions/18390574/how-to-delete-duplicate-rows-in-sql-server
UPSERT
is extremely handy, and reduces the number of find, check on server, update loops. But RETURNING
is a fundamental part of that (to get the updated/existing) ID. Can't believe SQL hasn't standardized it yet as of 2022. But both SQLite and Postgres support it with similar syntax thankfully.Attempt at nodejs/sequelize/raw/upsert.js:
- stackoverflow.com/questions/48816629/on-conflict-do-nothing-in-postgres-with-a-not-null-constraint OP unable to provide a minimal exampe, but it is likely the problem
- dba.stackexchange.com/questions/292428/postgresql-upsert-issue-with-not-null-columns
Related on more complex constraints:
It is mind blowing that this is not possible... the only way to avoid ambiguity in JOINs with column name conflicts is to give aliases to each column...
An example where
SELECT FOR UPDATE
is a good solution to an use case can be seen at: nodejs/sequelize/raw/parallel_select_and_update.js.SELECT FOR UPDATE
vs/together with the SQL transaction isolation level is commented at: stackoverflow.com/questions/10935850/when-to-use-select-for-update.SQL's implementation of database triggers.
This feature is really cool, as it allows you to keep caches up to date!
In particular, everything that happens in a trigger happens as if it were in a transaction. This way, you can do less explicit transactions when you use triggers. It is a bit like the advantages of SQL CASCADE.
DBMS:
TODO what is the standard compliant syntax?
PostgreSQL requires you to define a SQL stored procedure: stackoverflow.com/questions/28149494/is-it-possible-to-create-trigger-without-execute-procedure-in-postgresql Their syntax may be standard compliant, not sure about the
EXECUTE
part. Their docs: www.postgresql.org/docs/current/sql-createtrigger.htmlSQLite does not support SQL stored procedures at all, so maybe that's why they can't be standard compliant here: stackoverflow.com/questions/3335162/creating-stored-procedure-in-sqlite
SQL:1999 11.38 covers "Trigger definition". The Abstract syntax tree starts with the
CREATE TRIGGER
and ends in:<triggered SQL statement> ::=
<SQL procedure statement>
This is defined at 13.5 "SQL procedure statement", but that is humongous and I'm not sure what it is at all.
In this example we cache track the number of posts per user on a cache column.
Basic example tested on SQLite 3.40.1, Ubuntu 23.04:output:
sqlite3 :memory: 'select 1 union select 2'
1
2
Two columns two rows:output:
sqlite3 :memory: <<EOF
select * from (values (1, 2), (2, 3))
union
select * from (values (2, 3), (3, 4))
EOF
1|2
2|3
3|4
Note how duplicates are removed, to keep them we output:
UNION ALL
instead:sqlite3 :memory: <<EOF
select * from (values (1, 2), (2, 3))
union all
select * from (values (2, 3), (3, 4))
EOF
1|2
2|3
2|3
3|4
This happens when you calculate a bunch of values on your program, and then want to save the to SQL.
Bibliography:
Dumping examples under nodejs/sequelize/raw/many_to_many.js.
Not possible without subqueries in the standard syntax, a huge shame: stackoverflow.com/questions/1293330/how-can-i-do-an-update-statement-with-join-in-sql-server
Demo under: nodejs/sequelize/raw/many_to_many.js.
NO way in the SQL standard apparently, but you'd hope that implementation status would be similar to UPDATE with JOIN, but not even!
- PostgreSQL: possible with
DELETE FROM USING
: stackoverflow.com/questions/11753904/postgresql-delete-with-inner-join - SQLite: not possible without subqueries as of 3.35 far: stackoverflow.com/questions/24511153/how-delete-table-inner-join-with-other-table-in-sqlite, Does not appear to have any relevant features at: www.sqlite.org/lang_delete.html
ORM
- Sequelize: no support of course: stackoverflow.com/questions/40890131/sequelize-destroy-record-with-join
A quick look at PostgreSQL's compliance notes: www.postgresql.org/docs/13/features.html shows the complete utter mess that this standard is. Multiple compliance levels that no one fully implements and optional features everywhere.
OK, there's a billion questions:
- SQL Server
- stackoverflow.com/questions/485409/generating-a-histogram-from-column-values-in-a-database OP did not know the difference between count and histogram :-) But it's the number one Google result.
- stackoverflow.com/questions/19103991/create-range-bins-from-sql-server-table-for-histograms has a minor extra group by twist, but otherwise fine
- stackoverflow.com/questions/16268441/generate-histogram-in-sql-server
- SQLite
- stackoverflow.com/questions/67514208/how-to-optimise-creating-histogram-bins-in-sqlite perf only, benchmarking would be needed. SQLite.
- stackoverflow.com/questions/32155449/create-a-histogram-with-a-dynamic-number-of-partitions-in-sqlite variable bin size, same number of entries per bin
- stackoverflow.com/questions/60348109/histogram-for-time-periods-using-sqlite-regular-buckets-1h-wide time
- MySQL: stackoverflow.com/questions/1764881/getting-data-for-histogram-plot MySQL appears to extend
ROUND
to also round by integers:ROUND(numeric_value, -2)
, but this is not widely portable which is a shame - stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql specifically asks about empty bins, which is amazing. Amazon Redshift dialect unfortunately, but answer provided works widely, and Redshift was forked from PostgreSQL, so there's hope. Those newb open source server focused projects that don't use AGPL!
Let's try it on SQLite 3.40.1, Ubuntu 23.04. Data setup:
sqlite3 tmp.sqlite 'create table t(x integer)'
sqlite3 tmp.sqlite <<EOF
insert into t values (
0,
2,
2,
3,
5,
6,
6,
8,
9,
17,
)
EOF
sqlite3 tmp.sqlite 'create index tx on t(x)'
For a bin size of 5 ignoring empty ranges we can:which produces the desired:
sqlite3 tmp.sqlite <<EOF
select floor(x/5)*5 as x,
count(*) as cnt
from t
group by 1
order by 1
EOF
0|4
5|5
15|1
And to consider empty ranges we can use SQL which outputs the desired:
genenerate_series
+ as per stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql:sqlite3 tmp.sqlite <<EOF
select x, sum(cnt) from (
select floor(x/5)*5 as x,
count(*) as cnt
from t
group by 1
union
select *, 0 as cnt from generate_series(0, 15, 5)
)
group by x
EOF
0|4
5|5
10|0
15|1
Let's try it on SQLite 3.40.1, Ubuntu 23.04. Data setup:
sqlite3 tmp.sqlite 'create table t(x integer, y integer)'
sqlite3 tmp.sqlite <<EOF
insert into t values
(0, 0),
(1, 1),
(2, 2),
(3, 3),
(4, 4),
(5, 5),
(6, 6),
(7, 7),
(8, 8),
(9, 9),
(10, 10),
(11, 11),
(12, 12),
(13, 13),
(14, 14),
(15, 15),
(16, 16),
(17, 17),
(18, 18),
(19, 19),
(2, 18)
EOF
sqlite3 tmp.sqlite 'create index txy on t(x, y)'
For a bin size of 5 ignoring empty ranges we can:which produces the desired:
sqlite3 tmp.sqlite <<EOF
select
floor(x/5)*5 as x,
floor(y/5)*5 as y,
count(*) as cnt
from t
group by 1, 2
order by 1, 2
EOF
0|0|5
0|15|1
5|5|5
10|10|5
15|15|5
And to consider empty ranges we can use SQL which outputs the desired:
genenerate_series
+ as per stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql:sqlite3 tmp.sqlite <<EOF
select x, y, sum(cnt) from (
select
floor(x/5)*5 as x,
floor(y/5)*5 as y,
count(*) as cnt
from t
group by 1, 2
union
select *, 0 as cnt from generate_series(0, 15, 5) inner join (select * from generate_series(0, 15, 5))
)
group by x, y
EOF
0|0|5
0|5|0
0|10|0
0|15|1
5|0|0
5|5|5
5|10|0
5|15|0
10|0|0
10|5|0
10|10|5
10|15|0
15|0|0
15|5|0
15|10|0
15|15|5
Example: nodejs/sequelize/raw/tree.js
- Implementation agnostic
- stackoverflow.com/questions/192220/what-is-the-most-efficient-elegant-way-to-parse-a-flat-table-into-a-tree
- stackoverflow.com/questions/5508985/recursive-query-for-adjacency-list-to-preorder-tree-traversal-in-sql DBMS agnostic specifically asking not to modify adjacenty list data structure
- Postgres
- stackoverflow.com/questions/67848017/simple-recursive-sql-query
- stackoverflow.com/questions/28688264/how-to-traverse-a-hierarchical-tree-structure-structure-backwards-using-recursiv
- stackoverflow.com/questions/51822070/how-can-postgres-represent-a-tree-of-row-ids
- depth first
- uspecified depth first variant
- preorder DFS
- breadth-first stackoverflow.com/questions/3709292/select-rows-from-table-using-tree-order
- MySQL
- stackoverflow.com/questions/8252323/mysql-closure-table-hierarchical-database-how-to-pull-information-out-in-the-c asks how to use a specific order (preorder DFS) with closure table
- Microsoft SQL Server
How to implement Nested set model in SQL:
- stackoverflow.com/questions/192220/what-is-the-most-efficient-elegant-way-to-parse-a-flat-table-into-a-tree/42781302#42781302 contains the correct left/size representation and update queries, which makes it much easier to maintain the tree without having to worry about the sizes of siblings which are constant
- stackoverflow.com/questions/192220/what-is-the-most-efficient-elegant-way-to-parse-a-flat-table-into-a-tree/194031#194031 amazing ASCII art representations of the structure. Unfortunatly uses a wonky left/right representation, rather than the much more natural left/size representation from the other post
Minimal example: nodejs/sequelize/raw/recursive.js
More advanced SQL tree traversal examples: nodejs/sequelize/raw/tree.js
Example under: nodejs/sequelize/raw/tree.js
The highly underdocumented built-in module, that supports SQL spatial index and a lot more.
Quite horrendous as it only seems to work on geometric types and not existing columns. But why.
And it uses custom operatores, where standard operators would have been just fine for points...
Minimal runnable example with points:The index creation unfortunately took 100s, so it will not scale to 1B points very well whic his a shame.
set -x
time psql -c 'drop table if exists t'
time psql -c 'create table t(p point)'
time psql -c "insert into t select (point ('(' || generate_series || ',' || generate_series || ')')) from generate_series(1, 10000000)"
time psql -c 'create index on t using gist(p)'
time psql -c "select count(*) from t where p <@ box '(1000000,1000000),(9000000,2000000)'"
Some sources about it:
The third part module, which clutters up any serches you make for the built-in one.
Similar to SQL subquery, but with some differences: stackoverflow.com/questions/706972/difference-between-cte-and-subquery
rm -f tmp.sqlite
sqlite3 tmp.sqlite 'create table t(i integer)'
sqlite3 tmp.sqlite 'insert into t values (1), (2)'
sqlite3 tmp.sqlite 'with mycte as ( select * from t ) delete from mycte where i = 1'
sqlite3 tmp.sqlite 'select * from t'
Useful for testing: stackoverflow.com/questions/21819183/how-to-use-ctes-with-update-delete-on-sqlite
sqlite3 :memory: 'WITH t (i, j) AS (VALUES (1, -1), (2, -2)) SELECT * FROM t'
Each transaction isolation level specifies what can or cannot happen when two queries are being run in parallel, i.e.: the memory semantics of the system.
Remember that queries can affects thousands of rows, and database systems like PostgreSQL can run multiple such queries at the same time.
Good summary on the PostgreSQL page: www.postgresql.org/docs/14/transaction-iso.html
Implementation specifics:
Example where this level is sufficient: nodejs/sequelize/raw/parallel_update_async.js.
Vs SQL SERIALIZABLE isolation level on PostgreSQL: dba.stackexchange.com/questions/284744/postgres-repeatable-read-vs-serializable
nodejs/sequelize/raw/parallel_create_delete_empty_tag.js is an example which experimentally seems to be solved by
REAPEATABLE READ
, although we are not sure that this is truly the case and why. What is clear is that that example is not solved by the SQL READ COMMITTED isolation level.In PostgreSQL, this is the first isolation level which can lead to postgreSQL serialization failures, this does not happen to SQL READ COMMITTED isolation level in that DBMS. You then have to retry the transaction.
nodejs/sequelize/raw/parallel_update_worker_threads.js contains a base example that can be used to test what can happen when queries are being run in parallel. But it is broken due to a
sqlite3
Node.js package bug: github.com/mapbox/node-sqlite3/issues/1381...nodejs/sequelize/raw/parallel_update_async.js is an
async
version of it. It should be just parallel enough to allow observing the same effects.This is an example of a transaction where the SQL READ COMMITTED isolation level if sufficient.
These examples run queries of type:
UPDATE "MyInt" SET i = i + 1
Sample execution:which does:
node --unhandled-rejections=strict ./parallel_update_async.js p 10 100
- PostgreSQL, see other databases options at SQL example
- 10 threads
- 100 increments on each thread
The fear then is that of a classic read-modify-write failure.
But as www.postgresql.org/docs/14/transaction-iso.html page makes very clear, including with an explicit example of type
UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345;
, that the default isolation level, SQL READ COMMITTED isolation level, already prevents any problems with this, as the update always re-reads selected rows in case they were previously modified.If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row
Since in PostgreSQL "Read uncommitted" appears to be effectively the same as "Read committed", we won't be able to observe any failures on that database system for this example.
nodejs/sequelize/raw/parallel_create_delete_empty_tag.js contains an example where things can actually blow up in read committed.
This example is similar to nodejs/sequelize/raw/parallel_update_async.js, but now we are doing a separate SELECT, later followed by an update:
SELECT FROM
to get i- update on Js code
newI = i + 1
UPDATE SET
thenewI
Although this specific example is useless in itself, as we could just use
UPDATE "MyInt" SET i = i + 1
as in nodejs/sequelize/raw/parallel_update_async.js, which automatically solves any concurrency issue, this kind of code could be required for example if the update was a complex function not suitably implemented in SQL, or if the update depends on some external data source.Sample execution:which does:
node --unhandled-rejections=strict ./parallel_select_and_update.js p 2 10 'READ COMMITTED'
- PostgreSQL, see other databases options at SQL example
- 2 threads
- 10 increments on each thread
Another one:this will run SELECT FOR UPDATE rather than just SELECT
node --unhandled-rejections=strict ./parallel_select_and_update.js p 2 10 'READ COMMITTED' 'FOR UPDATE'
Observed behaviour under different SQL transaction isolation levels:
READ COMMITTED
: fails. Nothing in this case prevents:- thread 1: SELECT, obtains i = 0
- thread 2: SELECT, obtains i = 0
- thread 2: newI = 1
- thread 2: UPDATE i = 1
- thread 1: newI = 1
- thread 1: UPDATE i = 1
REPEATABLE READ
: works. the manual mentions that if multiple concurrent updates would happen, only the first commit succeeds, and the following ones fail and rollback and retry, therefore preventing the loss of an update.READ COMMITTED
+SELECT FOR UPDATE
: works. And does not do rollbacks, which probably makes it faster. Withp 10 100
,REPEATABLE READ
was about 4.2s andREAD COMMITTED
+SELECT FOR UPDATE
3.2s on Lenovo ThinkPad P51 (2017).SELECT FOR UPDATE
should be enough as mentioned at: www.postgresql.org/docs/13/explicit-locking.html#LOCKING-ROWSFOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see Section 13.4.
A non-raw version of this example can be seen at: nodejs/sequelize/parallel_select_and_update.js.
This example contains a deterministic demo of when postgreSQL serialization failures may happen.
Tested on PostgreSQL 13.5.
Articles were limited to the first 100 out of 105 total. Click here to view all children of Type of database.
Articles by others on the same topic
There are currently no matching articles.