A software that implements some database system, e.g. PostgreSQL or MySQL are two (widely extended) SQL implementations.
- SQLite with
rowid
: stackoverflow.com/questions/8190541/deleting-duplicate-rows-from-sqlite-database - SQL Server has crazy "CTEs" change backing table extension: stackoverflow.com/questions/18390574/how-to-delete-duplicate-rows-in-sql-server
Demo under: nodejs/sequelize/raw/many_to_many.js.
NO way in the SQL standard apparently, but you'd hope that implementation status would be similar to UPDATE with JOIN, but not even!
- PostgreSQL: possible with
DELETE FROM USING
: stackoverflow.com/questions/11753904/postgresql-delete-with-inner-join - SQLite: not possible without subqueries as of 3.35 far: stackoverflow.com/questions/24511153/how-delete-table-inner-join-with-other-table-in-sqlite, Does not appear to have any relevant features at: www.sqlite.org/lang_delete.html
ORM
- Sequelize: no support of course: stackoverflow.com/questions/40890131/sequelize-destroy-record-with-join
The default feathers-chat app runs on NeDB (local filesystem JSON database).
Ciro Santilli managed to port it to Sequelize for PostgreSQL as shown at: github.com/cirosantilli/feathers-chat/tree/sequelize-pg
Got it working as mentioned at: github.com/cirosantilli/feathers-chat/tree/sequelize-pg
Bibliography:
There's also a
heroku
branch at: github.com/feathersjs/feathers-chat/tree/heroku, but it also seems to use NeDB? So you can have a filesystem in Heroku? Doesn't seem so: stackoverflow.com/questions/42775418/heroku-local-persistent-storage @cirosantilli/_file/nodejs/sequelize/nodejs/sequelize/parallel_select_and_update.js Updated 2025-02-13 +Created 1970-01-01
This example is the same as nodejs/sequelize/raw/parallel_select_and_update.js, but going through Sequelize rather than with Sequelize raw queries.
NONE
is not supported for now to not have a transaction at all because lazy.The examples illustrates: stackoverflow.com/questions/55452441/for-share-and-for-update-statements-in-sequelize
Sample invocation:where:
node --unhandled-rejections=strict ./parallel_select_and_update.js p 10 100 READ_COMMITTED UPDATE
READ_COMMITTED
: one of the keys documented at: sequelize.org/master/class/lib/transaction.js~Transaction.html which correspond to the standard sQL isolation levels. It not given, don't set one, defaulting to the database's/sequelize's default level.UPDATE
: one of the keys documented at: sequelize.org/master/class/lib/transaction.js~Transaction.html#static-get-LOCK. Update generates aSELECT FOR UPDATE
in PostgreSQL for example. If not given, don't use anyFOR xxx
explicit locking.
Other examples:
node --unhandled-rejections=strict ./parallel_select_and_update.js p 10 100 READ_COMMITTED UPDATE
Then, the outcome is exactly as described at: nodejs/sequelize/raw/parallel_select_and_update.js:
READ_COMMITTED
: failsREAD_COMMITTED UPDATE
: worksREPEATABLE_READ
: works, but is a bit slower, as it does rollbacksThis case also illustrates Sequelize transaction retries, since in this transaction isolation level transactions may fail:
@cirosantilli/_file/nodejs/sequelize/raw/nodejs/sequelize/raw/parallel_create_delete_empty_tag.js Updated 2025-02-13 +Created 1970-01-01
In this example, posts have tags. When a post is deleted, we check to see if there are now any empty tags, and now we want to delete any empty tags that the post deletion may have created.
If we are creating and deleting posts concurrently, a naive implementation might wrongly delete the tags of a newly created post.
This could be due to a concurrency issue of the following types.
Failure case 1:which would result in the new post incorrectly not having the
- thread 2: delete old post
- thread 2: find all tags with 0 posts. Finds
tag0
from the deleted old post which is now empty. - thread 1: create new post, which we want to have tag
tag0
- thread 1: try to create a new tag
tag0
, but don't because it already exists, this is done using SQLite'sINSERT OR IGNORE INTO
or PostgreSQL'sINSERT ... ON CONFLICT DO NOTHING
- thread 1: assign
tag0
to the new post by adding an entry to the join table - thread 2: delete all tags with 0 posts. It still sees from its previous search that
tag0
is empty, and deletes it, which then cascades into the join table
tag0
.Failure case 2:which leads to a foreign key failure, because the tag does not exist anymore when the assignment happens.
- thread 2: delete old post
- thread 2: find all tags with 0 posts
- thread 1: create new post
- thread 1: try to create a new tag
tag0
, but don't because it already exists - thread 2: delete all tags with 0 posts. It still sees from its previous search that
tag0
is empty, and deletes it - thread 1: assign
tag0
to the new post
Failure case 3:which leads to a foreign key failure, because the tag does not exist anymore when the assignment happens.
- thread 2: delete old post
- thread 1: create new post, which we want to have tag
tag0
- thread 1: try to create a new tag
tag0
, and succeed because it wasn't present - thread 2: find all tags with 0 posts, finds the tag that was just created
- thread 2: delete all tags with 0 posts, deleting the new tag
- thread 1: assign
tag0
to the new post
Sample executions:All executions use 2 threads.
node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'READ COMMITTED'
: PostgreSQL, 9 tags, DELETE/CREATE thetag0
test tag 1000 times, useREAD COMMITTED
Execution often fails, although not always. The failure is always:because the:error: insert or update on table "PostTag" violates foreign key constraint "PostTag_tagId_fkey"
tries to insert a tag that was deleted in the other thread, as it didn't have any corresponding posts, so this is the foreign key failure.INSERT INTO "PostTag"
TODO: we've never managed to observe the failure case in whichtag0
is deleted. Is it truly possible? And if not, by which guarantee?node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'READ COMMITTED' 'FOR UPDATE'
: do aSELECT ... FOR UPDATE
before trying toINSERT
.This is likely correct and the fastest correct method according to our quick benchmarking, about 20% faster thanREPEATABLE READ
.We are just now 100% sure it is corret becase we can't find out if theSELECT
in theDELETE
subquery could first select some rows, which are then locked by the tag creator, and only then locked byDELETE
after selection. Or does it re-evaludate theSELECT
even though it is in a subquery?node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'REPEATABLE READ'
: repeatable readWe've never observed any failures with this level. This should likely fix the foreign key issue according to the PostgreSQL docs, since:- the
DELETE "Post"
commit cannot start to be seen only in the middle of the thread 1 transaction - and then if DELETE happened, the thread 1 transaction will detect it, ROLLBACK, and re-run. TODO how does it detect the need rollback? Is it because of the foreign key? It is very hard to be sure about this kind of thing, just can't find the information. Related: postgreSQL serialization failure.
- the
node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'SERIALIZABLE'
: serializablenode --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'NONE'
: magic value, don't use any transaction. Can blow up of course, since even less restrictions thanREAD COMMITTED
Some theoretical notes:
- Failure case 3 is averted by a
READ COMMITTED
transaction, because thread 2 won't see the uncommitted tag that thread 1 created, and therefore won't be able to delete it
stackoverflow.com/questions/10935850/when-to-use-select-for-update from SELECT FOR UPDATE also talks about a similar example, and has relevant answers.
@cirosantilli/_file/nodejs/sequelize/raw/nodejs/sequelize/raw/parallel_select_and_update_deterministic.js Updated 2025-02-13 +Created 1970-01-01
This example contains a deterministic demo of when postgreSQL serialization failures may happen.
Tested on PostgreSQL 13.5.
@cirosantilli/_file/nodejs/sequelize/raw/nodejs/sequelize/raw/parallel_select_and_update.js Updated 2025-02-13 +Created 1970-01-01
This example is similar to nodejs/sequelize/raw/parallel_update_async.js, but now we are doing a separate SELECT, later followed by an update:
SELECT FROM
to get i- update on Js code
newI = i + 1
UPDATE SET
thenewI
Although this specific example is useless in itself, as we could just use
UPDATE "MyInt" SET i = i + 1
as in nodejs/sequelize/raw/parallel_update_async.js, which automatically solves any concurrency issue, this kind of code could be required for example if the update was a complex function not suitably implemented in SQL, or if the update depends on some external data source.Sample execution:which does:
node --unhandled-rejections=strict ./parallel_select_and_update.js p 2 10 'READ COMMITTED'
- PostgreSQL, see other databases options at SQL example
- 2 threads
- 10 increments on each thread
Another one:this will run SELECT FOR UPDATE rather than just SELECT
node --unhandled-rejections=strict ./parallel_select_and_update.js p 2 10 'READ COMMITTED' 'FOR UPDATE'
Observed behaviour under different SQL transaction isolation levels:
READ COMMITTED
: fails. Nothing in this case prevents:- thread 1: SELECT, obtains i = 0
- thread 2: SELECT, obtains i = 0
- thread 2: newI = 1
- thread 2: UPDATE i = 1
- thread 1: newI = 1
- thread 1: UPDATE i = 1
REPEATABLE READ
: works. the manual mentions that if multiple concurrent updates would happen, only the first commit succeeds, and the following ones fail and rollback and retry, therefore preventing the loss of an update.READ COMMITTED
+SELECT FOR UPDATE
: works. And does not do rollbacks, which probably makes it faster. Withp 10 100
,REPEATABLE READ
was about 4.2s andREAD COMMITTED
+SELECT FOR UPDATE
3.2s on Lenovo ThinkPad P51 (2017).SELECT FOR UPDATE
should be enough as mentioned at: www.postgresql.org/docs/13/explicit-locking.html#LOCKING-ROWSFOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see Section 13.4.
A non-raw version of this example can be seen at: nodejs/sequelize/parallel_select_and_update.js.
@cirosantilli/_file/nodejs/sequelize/raw/nodejs/sequelize/raw/parallel_update_async.js Updated 2025-02-13 +Created 1970-01-01
nodejs/sequelize/raw/parallel_update_worker_threads.js contains a base example that can be used to test what can happen when queries are being run in parallel. But it is broken due to a
sqlite3
Node.js package bug: github.com/mapbox/node-sqlite3/issues/1381...nodejs/sequelize/raw/parallel_update_async.js is an
async
version of it. It should be just parallel enough to allow observing the same effects.This is an example of a transaction where the SQL READ COMMITTED isolation level if sufficient.
These examples run queries of type:
UPDATE "MyInt" SET i = i + 1
Sample execution:which does:
node --unhandled-rejections=strict ./parallel_update_async.js p 10 100
- PostgreSQL, see other databases options at SQL example
- 10 threads
- 100 increments on each thread
The fear then is that of a classic read-modify-write failure.
But as www.postgresql.org/docs/14/transaction-iso.html page makes very clear, including with an explicit example of type
UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345;
, that the default isolation level, SQL READ COMMITTED isolation level, already prevents any problems with this, as the update always re-reads selected rows in case they were previously modified.If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row
Since in PostgreSQL "Read uncommitted" appears to be effectively the same as "Read committed", we won't be able to observe any failures on that database system for this example.
nodejs/sequelize/raw/parallel_create_delete_empty_tag.js contains an example where things can actually blow up in read committed.
This feels good.
One problem though is that Heroku is very opinionated, a likely like other PaaSes. So if you are trying something that is slightly off the mos common use case, you might be fucked.
Another problem with Heroku is that it is extremely difficult to debug a build that is broken on Heroku but not locally. We needed a way to be able to drop into a shell in the middle of build in case of failure. Otherwise it is impossible.
Deployment:
git push heroku HEAD:master
View stdout logs:
heroku logs --tail
PostgreSQL database, it seems to be delegated to AWS. How to browse database: stackoverflow.com/questions/20410873/how-can-i-browse-my-heroku-database
heroku pg:psql
Drop and recreate database:All tables are destroyed.
heroku pg:reset --confirm <app-name>
Restart app:
heroku restart
How to decide if an ORM is decent? Just try to replicate every SQL query from nodejs/sequelize/raw/many_to_many.js on PostgreSQL and SQLite.
There is only a very finite number of possible reasonable queries on a two table many to many relationship with a join table. A decent ORM has to be able to do them all.
If it can do all those queries, then the ORM can actually do a good subset of SQL and is decent. If not, it can't, and this will make you suffer. E.g. Sequelize v5 is such an ORM that makes you suffer.
The next thing to check are transactions.
Basically, all of those come up if you try to implement a blog hello world world such as gothinkster/realworld correctly, i.e. without unnecessary inefficiencies due to your ORM on top of underlying SQL, and dealing with concurrency.
TODO what is the standard compliant syntax?
PostgreSQL requires you to define a SQL stored procedure: stackoverflow.com/questions/28149494/is-it-possible-to-create-trigger-without-execute-procedure-in-postgresql Their syntax may be standard compliant, not sure about the
EXECUTE
part. Their docs: www.postgresql.org/docs/current/sql-createtrigger.htmlSQLite does not support SQL stored procedures at all, so maybe that's why they can't be standard compliant here: stackoverflow.com/questions/3335162/creating-stored-procedure-in-sqlite
SQL:1999 11.38 covers "Trigger definition". The Abstract syntax tree starts with the
CREATE TRIGGER
and ends in:<triggered SQL statement> ::=
<SQL procedure statement>
This is defined at 13.5 "SQL procedure statement", but that is humongous and I'm not sure what it is at all.
Evil company that desecrated the beauty created by Sun Microsystems, and was trying to bury Java once and or all in the 2010's.
Their database is already matched by open source e.g. PostgreSQL, and ERP and CRM specific systems are boring.
Oracle basically grew out of selling one of the first SQL implementations in the late 70's, and notably to the United States Government and particularly the CIA. They did deliver a lot of value in those early pre-internet days, but now open source is and will supplant them entirely.
This section was tested on Ubuntu 24.10, PostgreSQL 16.6.
Let's create some test data like this:
time psql tmp -c 'DROP TABLE IF EXISTS fts;'
time psql tmp -c 'CREATE TABLE fts(s TEXT, i INTEGER);'
time psql tmp <<'EOF'
INSERT INTO fts SELECT
i::text || ' ' ||
(i * 2 )::text || ' ' ||
(i * 5 )::text || ' ' ||
(i * 7 )::text || ' ' ||
(i * 11 )::text || ' ' ||
(i * 13 )::text || ' ' ||
(i * 17 )::text || ' ' ||
(i * 23 )::text || ' ' ||
(i * 29 )::text || ' ' ||
(i * 31 )::text
,
i % 100
FROM generate_series(1::bigint, 100000000::bigint) AS s(i);
EOF
The creation time was 2m13s, and the final size was:
table_name | pg_size_pretty | pg_total_relation_size
------------------+----------------+------------------------
fts | 13 GB | 14067326976
This test data will be simple to predict what each line contains so we can make educated queries, while also posing some difficulty to the RDMS. As per:the first columns look like:
time psql tmp -c 'SELECT * FROM fts LIMIT 10;'
s | i
-------------------------------------+----
1 2 5 7 11 13 17 23 29 31 | 1
2 4 10 14 22 26 34 46 58 62 | 2
3 6 15 21 33 39 51 69 87 93 | 3
4 8 20 28 44 52 68 92 116 124 | 4
5 10 25 35 55 65 85 115 145 155 | 5
6 12 30 42 66 78 102 138 174 186 | 6
7 14 35 49 77 91 119 161 203 217 | 7
8 16 40 56 88 104 136 184 232 248 | 8
9 18 45 63 99 117 153 207 261 279 | 9
10 20 50 70 110 130 170 230 290 310 | 10
We aimed to create a test table of size around 10 GB, as in practice it is around that order of size that index speedups start to become very obvious on a SSD-based system.
Before we create the index, let's see if our non-indexed queries are slow enough for our tests:which gives:so it should be enough to observe the index speedup.
time psql tmp -c "SELECT * FROM fts WHERE s LIKE '% 50000000 %';"
s | i
---------------------------------------------------------------------------------------------------+---
10000000 20000000 50000000 70000000 110000000 130000000 170000000 230000000 290000000 310000000 | 0
25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000 | 0
(2 rows)
real 0m11.758s
user 0m0.017s
sys 0m0.008s
Now let's create the index. First we create a generated column that splits the strings with These commands took 8m51s and 40m8s and the DB size went up about 5x:
to_tsvector
, and then we index that split column:time psql tmp <<'EOF'
ALTER TABLE fts ADD COLUMN s_ts tsvector
GENERATED ALWAYS AS (to_tsvector('english', s)) STORED;
EOF
time psql tmp -c 'CREATE INDEX s_ts_gin_idx ON fts USING GIN (s_ts);'
table_name | pg_size_pretty | pg_total_relation_size
------------------+----------------+------------------------
fts | 69 GB | 74487758848
And finally let's try out the index:which "instantly" gives us in 0m0.129s:so the index worked!
time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000');"
s | i
-------------------------------------------------------------------------------------------------------+---
10000000 20000000 50000000 70000000 110000000 130000000 170000000 230000000 290000000 310000000 | 0
25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000 | 0
50000000 100000000 250000000 350000000 550000000 650000000 850000000 1150000000 1450000000 1550000000 | 0
We understand from this that it only find exact word hits.
Another important use case is to search for prefixes of words, e.g. as you'd want in a simple autocompletion system. This can be achieved by adding This finishes in the same amount of time, and gives:so now we have cool hits such as
:*
at the end of the search term as in:time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000:*');"
s | i
-----------------------------------------------------------------------------------------------------------+----
10000000 20000000 50000000 70000000 110000000 130000000 170000000 230000000 290000000 310000000 | 0
38461539 76923078 192307695 269230773 423076929 500000007 653846163 884615397 1115384631 1192307709 | 39
45454546 90909092 227272730 318181822 500000006 590909098 772727282 1045454558 1318181834 1409090926 | 46
50000000 100000000 250000000 350000000 550000000 650000000 850000000 1150000000 1450000000 1550000000 | 0
71428572 142857144 357142860 500000004 785714292 928571436 1214285724 1642857156 2071428588 2214285732 | 72
100000000 200000000 500000000 700000000 1100000000 1300000000 1700000000 2300000000 2900000000 3100000000 | 0
29411765 58823530 147058825 205882355 323529415 382352945 500000005 676470595 852941185 911764715 | 65
25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000 | 0
500000000
, 500000004
, 500000005
, 500000007
and 500000006
. The syntax is also mentioned at:Next we can also try some other queries with multiple terms. Text must contain two words with gives:
&
:time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000 & 175000000');"
s | i
-------------------------------------------------------------------------------------------------------+---
25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000 | 0
Text can contain either word with gives:
|
:time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000 | 175000000');"
s | i
---------------------------------------------------------------------------------------------------------+---
10000000 20000000 50000000 70000000 110000000 130000000 170000000 230000000 290000000 310000000 | 0
50000000 100000000 250000000 350000000 550000000 650000000 850000000 1150000000 1450000000 1550000000 | 0
87500000 175000000 437500000 612500000 962500000 1137500000 1487500000 2012500000 2537500000 2712500000 | 0
25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000 | 0
35000000 70000000 175000000 245000000 385000000 455000000 595000000 805000000 1015000000 1085000000 | 0
Text can contain the given words sequentially:gives:
time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', '50000000 <-> 125000000 <-> 175000000');"
s | i
-------------------------------------------------------------------------------------------------------+---
25000000 50000000 125000000 175000000 275000000 325000000 425000000 575000000 725000000 775000000 | 0
We can also inspect how words were split by simply doing a
SELECT *
again: s | i | s_ts
----------------------------+---+----------------------------------------------------------------------
1 2 5 7 11 13 17 23 29 31 | 1 | '1':1 '11':5 '13':6 '17':7 '2':2 '23':8 '29':9 '31':10 '5':3 '7':4
2 4 10 14 22 26 34 46 58 62 | 2 | '10':3 '14':4 '2':1 '22':5 '26':6 '34':7 '4':2 '46':8 '58':9 '62':10
3 6 15 21 33 39 51 69 87 93 | 3 | '15':3 '21':4 '3':1 '33':5 '39':6 '51':7 '6':2 '69':8 '87':9 '93':10
Let's check if the index updates automatically when we do an insert and if insertion seems to have been significantly slowed down by the index:finishes in:so performance is OK. Presumably, the insertion time is proportional to the number of tokens, doing one logarithmic operation per token, so indexing short chunks of text like titles is easy. And then let's find it:which finds it with:so we are all good. Unfortunately, accurate performance benchmarking is a bit harder than that, as the index by default first collects a certain number of updates into memory into the "pending list", before actually inserting them all at once after a certain mass is reached, as documented at: www.postgresql.org/docs/17/gin.html#GIN-IMPLEMENTATION. We are not going that deep today.
time psql tmp -c "INSERT INTO fts VALUES ('abcd efgh', 99)"
real 0m0.043s
user 0m0.014s
sys 0m0.010s
time psql tmp -c "SELECT s, i FROM fts WHERE s_ts @@ to_tsquery('english', 'efgh');"
s | i
-----------+----
abcd efgh | 99
The next thing that we need to understand is how gives:so we understand some of the heuristic normalizations:
to_tsvector
tokenizes strings for the english
language. For example running:psql -c "select to_tsvector('english', 'A Dog run runs fast faster two Cats: b c to from 1 é befhyph-afthyph.')"
'1':13
'afthyph':17
'b':9
'befhyph':16
'befhyph-afthyph':15
'c':10
'cat':8
'dog':2
'fast':5
'faster':6
'run':3,4
'two':7
'é':14
- prepositions like
to
andfrom
are gone. These are called stopwords as documented at: www.postgresql.org/docs/17/textsearch-controls.html#TEXTSEARCH-PARSING-DOCUMENTS - words are lowercased and singularized, e.g.
Cats
becomescat
- hyphenated words are stored both in separate components and in the full hyphenated form:
'afthyph':17
'befhyph':16
'befhyph-afthyph':15
The full list of languages available can be obtained with:On Ubuntu 24.10, the list contains major world languages, plus the special gives:so we understand that it is similar to
psql -c '\dF'
simple
configuration such that:psql -c "select to_tsvector('simple', 'A Dog run runs fast faster two Cats: b c to from 1 é befhyph-afthyph.')"
'1':13
'a':1
'afthyph':17
'b':9
'befhyph':16
'befhyph-afthyph':15
'c':10
'cats':8
'dog':2
'fast':5
'faster':6
'from':12
'run':3
'runs':4
'to':11
'two':7
'é':14
english
but it does not:- seem to have any stopwords
- do singularization normalization
From the query side of things, if the query is going to be open to end users on a web interface, we need to understand giving:To avoid such errors, we can use:Bibliography:
to_tsquery
better. The issue is that to_tsquery
is quite brutal and happily throws errors for common things users might do e.g. spaces:select to_tsquery('english', 'abc def');
ERROR: syntax error in tsquery: "abc def"
plainto_tsquery
: ANDs everythingwebsearch_to_tsquery
: supportsAND
with spaces, OR withor
, word negation with-word
and concatenation with"my word"
. But it unfortunately does not support prefixing, which is what everyone and their mother wants for autocomplete: stackoverflow.com/questions/14103880/escaping-special-characters-in-to-tsquery#comment78452351_41804957
- stackoverflow.com/questions/16020164/psqlexception-error-syntax-error-in-tsquery/16020565#16020565
- stackoverflow.com/questions/14103880/escaping-special-characters-in-to-tsquery
- www.reddit.com/r/rails/comments/1cmloa6/sanitizing_a_search_phrase_when_using_to_tsvector/
- stackoverflow.com/questions/6735881/to-tsquery-validation
- dba.stackexchange.com/questions/135030/filtering-special-characters-in-to-tsquery
Also posted at:
- www.reddit.com/r/PostgreSQL/comments/12yld1o/comment/m3l5nkv/ "Is it worth using Postgres' builtin full-text search or should I go straight to Elastic?", high top Google result for "PostgreSQL full text search" as of 2024. Random, but it's there.
To run examples on a specific database:All examples can be tested on all databases with:
./index.js
or./index.js l
: SQLite./index.js p
: PostgreSQL. You must manually create a database calledtmp
and ensure that peer authentication works for it
cd sequelize
./test
Overview of the examples:
- nodejs/sequelize/index.js: a bunch of basic examples
- nodejs/sequelize/update.js: This file is also where we are storing our expression-foo for now, e.g. how to do stuff like
col1 + col2
. Such knowledge can however be used basically anywhere else however, e.g. inAS
orWHERE
clauses, not just inUPDATE
. - nodejs/sequelize/count.js: a simplified single-table count example. In practice, this will be usually done together with JOIN queries across multiple tables. Answers: stackoverflow.com/questions/22627258/how-does-group-by-works-in-sequelize/69896449#69896449
- nodejs/sequelize/date.js: automatic date typecasts
- nodejs/sequelize/like.js: LIKE
- nodejs/sequelize/camel_case.js: trying to get everything in the database camel cased, columns starting with lowercase, and tables starting with uppercase. The defaults documented on getting started documentation do uppercase foreign keys, and lowercase non-foreign keys. It's a mess.
- nodejs/sequelize/ignore_duplicates.js: ignore query on unique violation with
ignoreDuplicates: true
which does SQLiteINSERT OR IGNORE INTO
or PostgreSQLON CONFLICT DO NOTHING
. Closely related Upsert versions:- Upsert
- nodejs/sequelize/upsert.js:
.upsert
selects the conflict column automatically by unique columns. Both SQLite and PostgreSQL doINSERT INTO ON CONFLICT
. PostgreSQL usesRETURNING
, which was added too recently to SQLite: www.sqlite.org/lang_returning.htmlAt nodejs/sequelize/composite_index.js we have a.upsert
test with a composite index. Works just fine as you'd expect with a compositeON CONFLICT
. Well done. - nodejs/sequelize/update_on_duplicate.js:
.bulkCreate({}, { updateOnDuplicate: ['col1', col2'] }
. Produces queries analogous to.upsert
. This method is cool because it can upsert multiple columns at once. But it is annoying that you have to specify all fields to be updated one by one manually. - stackoverflow.com/questions/29063232/how-to-get-the-id-of-an-inserted-or-updated-record-in-sequelize-upsert/72092277#72092277
- stackoverflow.com/questions/55531860/sequelize-bulkcreate-updateonduplicate-for-postgresql
- Upsert
- nodejs/sequelize/inc.js: demonstrate the
increment
method. In SQLite, it produces a statement of type:UPDATE `IntegerNames` SET `value`=`value`+ 1,`updatedAt`='2021-11-03 10:23:45.409 +00:00' WHERE `id` = 3
- nodejs/sequelize/sync_alter.js: illustrates
Model.sync({alter: true})
to modify a table definition, answers: stackoverflow.com/questions/54898994/bulkupdate-in-sequelize-orm/69044138#69044138 - nodejs/sequelize/truncate_key.js
- nodejs/sequelize/validation.js: is handled by a third-party library: github.com/validatorjs/validator.js. They then add a few extra validators on top of that.The
args: true
thing is explained at: stackoverflow.com/questions/58522387/unhandled-rejection-sequelizevalidationerror-validation-error-cannot-create-pr/70263032#70263032 - nodejs/sequelize/composite_index.js: stackoverflow.com/questions/34664853/sequelize-composite-unique-constraint
- nodejs/sequelize/indent_log.js: stackoverflow.com/questions/34664853/sequelize-composite-unique-constraint
- association examples:
- nodejs/sequelize/one_to_many.js: basic one-to-many examples.
- nodejs/sequelize/many_to_many.js: basic many-to-many examples, each user can like multiple posts. Answers: stackoverflow.com/questions/22958683/how-to-implement-many-to-many-association-in-sequelize/67973948#67973948
- ORDER BY include:
- nodejs/sequelize/many_to_many_custom_table.js: many-to-many example, but where we craft our own table which can hold extra data. In our case, users can like posts, but likes have a integer weight associated with them. Related threads:
- nodejs/sequelize/many_to_many_same_model.js: association between a model and itself: users can follow other users. Related:
- nodejs/sequelize/many_to_many_same_model_super.js
- nodejs/sequelize/many_to_many_super.js: "Super many to many": sequelize.org/master/manual/advanced-many-to-many.html This should not exist and shows how bad this library is for associations, you need all that boilerplate in order to expose certain relationships that aren't otherwise exposed by a direct
hasMany
with implicit join table.
- nested includes to produce queries with multiple JOIN:
- nodejs/sequelize/nested_include.js: find all posts by users that a given user follows. Answers: stackoverflow.com/questions/42632943/sequelize-multiple-where-clause/68018083#68018083
- nodejs/sequelize/nested_include_super.js: like nodejs/sequelize/nested_include.js but with a super many to many. We should move this to nodejs/sequelize/many_to_many_super.js.
- two relationships between two specific tables: we need to use
as:
to disambiguate them- nodejs/sequelize/many_to_many_double.js: users can both follow and like posts
- nodejs/sequelize/one_to_many_double.js: posts have the author and a mandatory reviewer
- hooks
- internals:
- nodejs/sequelize/common.js: common utilities used across examples, most notably:
- to easily setup different DBRM
- nodejs/sequelize/min_nocommon.js: to copy paste to Stack Overflow
- nodejs/sequelize/min.js: template for new exapmles in the folder
- nodejs/sequelize/common.js: common utilities used across examples, most notably: