Database management system Updated +Created
A software that implements some database system, e.g. PostgreSQL or MySQL are two (widely extended) SQL implementations.
DELETE with JOIN (SQL) Updated +Created
NO way in the SQL standard apparently, but you'd hope that implementation status would be similar to UPDATE with JOIN, but not even!
feathers-chat PostgreSQL Updated +Created
The default feathers-chat app runs on NeDB (local filesystem JSON database).
@cirosantilli/_file/nodejs/sequelize/nodejs/sequelize/parallel_select_and_update.js Updated +Created
This example is the same as nodejs/sequelize/raw/parallel_select_and_update.js, but going through Sequelize rather than with Sequelize raw queries. NONE is not supported for now to not have a transaction at all because lazy.
Sample invocation:
node --unhandled-rejections=strict ./parallel_select_and_update.js p 10 100 READ_COMMITTED UPDATE
where:
Other examples:
  • node --unhandled-rejections=strict ./parallel_select_and_update.js p 10 100 READ_COMMITTED UPDATE
Then, the outcome is exactly as described at: nodejs/sequelize/raw/parallel_select_and_update.js:
In this example, posts have tags. When a post is deleted, we check to see if there are now any empty tags, and now we want to delete any empty tags that the post deletion may have created.
If we are creating and deleting posts concurrently, a naive implementation might wrongly delete the tags of a newly created post.
This could be due to a concurrency issue of the following types.
Failure case 1:
  • thread 2: delete old post
  • thread 2: find all tags with 0 posts. Finds tag0 from the deleted old post which is now empty.
  • thread 1: create new post, which we want to have tag tag0
  • thread 1: try to create a new tag tag0, but don't because it already exists, this is done using SQLite's INSERT OR IGNORE INTO or PostgreSQL's INSERT ... ON CONFLICT DO NOTHING
  • thread 1: assign tag0 to the new post by adding an entry to the join table
  • thread 2: delete all tags with 0 posts. It still sees from its previous search that tag0 is empty, and deletes it, which then cascades into the join table
which would result in the new post incorrectly not having the tag0.
Failure case 2:
  • thread 2: delete old post
  • thread 2: find all tags with 0 posts
  • thread 1: create new post
  • thread 1: try to create a new tag tag0, but don't because it already exists
  • thread 2: delete all tags with 0 posts. It still sees from its previous search that tag0 is empty, and deletes it
  • thread 1: assign tag0 to the new post
which leads to a foreign key failure, because the tag does not exist anymore when the assignment happens.
Failure case 3:
  • thread 2: delete old post
  • thread 1: create new post, which we want to have tag tag0
  • thread 1: try to create a new tag tag0, and succeed because it wasn't present
  • thread 2: find all tags with 0 posts, finds the tag that was just created
  • thread 2: delete all tags with 0 posts, deleting the new tag
  • thread 1: assign tag0 to the new post
which leads to a foreign key failure, because the tag does not exist anymore when the assignment happens.
Sample executions:
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'READ COMMITTED': PostgreSQL, 9 tags, DELETE/CREATE the tag0 test tag 1000 times, use READ COMMITTED
    Execution often fails, although not always. The failure is always:
    error: insert or update on table "PostTag" violates foreign key constraint "PostTag_tagId_fkey"
    because the:
    INSERT INTO "PostTag"
    tries to insert a tag that was deleted in the other thread, as it didn't have any corresponding posts, so this is the foreign key failure.
    TODO: we've never managed to observe the failure case in which tag0 is deleted. Is it truly possible? And if not, by which guarantee?
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'READ COMMITTED' 'FOR UPDATE': do a SELECT ... FOR UPDATE before trying to INSERT.
    This is likely correct and the fastest correct method according to our quick benchmarking, about 20% faster than REPEATABLE READ.
    We are just now 100% sure it is corret becase we can't find out if the SELECT in the DELETE subquery could first select some rows, which are then locked by the tag creator, and only then locked by DELETE after selection. Or does it re-evaludate the SELECT even though it is in a subquery?
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'REPEATABLE READ': repeatable read
    We've never observed any failures with this level. This should likely fix the foreign key issue according to the PostgreSQL docs, since:
    • the DELETE "Post" commit cannot start to be seen only in the middle of the thread 1 transaction
    • and then if DELETE happened, the thread 1 transaction will detect it, ROLLBACK, and re-run. TODO how does it detect the need rollback? Is it because of the foreign key? It is very hard to be sure about this kind of thing, just can't find the information. Related: postgreSQL serialization failure.
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'SERIALIZABLE': serializable
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'NONE': magic value, don't use any transaction. Can blow up of course, since even less restrictions than READ COMMITTED
All executions use 2 threads.
Some theoretical notes:
  • Failure case 3 is averted by a READ COMMITTED transaction, because thread 2 won't see the uncommitted tag that thread 1 created, and therefore won't be able to delete it
stackoverflow.com/questions/10935850/when-to-use-select-for-update from SELECT FOR UPDATE also talks about a similar example, and has relevant answers.
This example contains a deterministic demo of when postgreSQL serialization failures may happen.
Tested on PostgreSQL 13.5.
This example is similar to nodejs/sequelize/raw/parallel_update_async.js, but now we are doing a separate SELECT, later followed by an update:
  • SELECT FROM to get i
  • update on Js code newI = i + 1
  • UPDATE SET the newI
Although this specific example is useless in itself, as we could just use UPDATE "MyInt" SET i = i + 1 as in nodejs/sequelize/raw/parallel_update_async.js, which automatically solves any concurrency issue, this kind of code could be required for example if the update was a complex function not suitably implemented in SQL, or if the update depends on some external data source.
Sample execution:
node --unhandled-rejections=strict ./parallel_select_and_update.js p 2 10 'READ COMMITTED'
which does:
Another one:
node --unhandled-rejections=strict ./parallel_select_and_update.js p 2 10 'READ COMMITTED' 'FOR UPDATE'
this will run SELECT FOR UPDATE rather than just SELECT
Observed behaviour under different SQL transaction isolation levels:
  • READ COMMITTED: fails. Nothing in this case prevents:
    • thread 1: SELECT, obtains i = 0
    • thread 2: SELECT, obtains i = 0
    • thread 2: newI = 1
    • thread 2: UPDATE i = 1
    • thread 1: newI = 1
    • thread 1: UPDATE i = 1
  • REPEATABLE READ: works. the manual mentions that if multiple concurrent updates would happen, only the first commit succeeds, and the following ones fail and rollback and retry, therefore preventing the loss of an update.
  • READ COMMITTED + SELECT FOR UPDATE: works. And does not do rollbacks, which probably makes it faster. With p 10 100, REPEATABLE READ was about 4.2s and READ COMMITTED + SELECT FOR UPDATE 3.2s on Lenovo ThinkPad P51 (2017).
    SELECT FOR UPDATE should be enough as mentioned at: www.postgresql.org/docs/13/explicit-locking.html#LOCKING-ROWS
    FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see Section 13.4.
A non-raw version of this example can be seen at: nodejs/sequelize/parallel_select_and_update.js.
nodejs/sequelize/raw/parallel_update_worker_threads.js contains a base example that can be used to test what can happen when queries are being run in parallel. But it is broken due to a sqlite3 Node.js package bug: github.com/mapbox/node-sqlite3/issues/1381...
nodejs/sequelize/raw/parallel_update_async.js is an async version of it. It should be just parallel enough to allow observing the same effects.
This is an example of a transaction where the SQL READ COMMITTED isolation level if sufficient.
These examples run queries of type:
UPDATE "MyInt" SET i = i + 1
Sample execution:
node --unhandled-rejections=strict ./parallel_update_async.js p 10 100
which does:
The fear then is that of a classic read-modify-write failure.
But as www.postgresql.org/docs/14/transaction-iso.html page makes very clear, including with an explicit example of type UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345;, that the default isolation level, SQL READ COMMITTED isolation level, already prevents any problems with this, as the update always re-reads selected rows in case they were previously modified.
If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row
Since in PostgreSQL "Read uncommitted" appears to be effectively the same as "Read committed", we won't be able to observe any failures on that database system for this example.
nodejs/sequelize/raw/parallel_create_delete_empty_tag.js contains an example where things can actually blow up in read committed.
Heroku Updated +Created
This feels good.
One problem though is that Heroku is very opinionated, a likely like other PaaSes. So if you are trying something that is slightly off the mos common use case, you might be fucked.
Another problem with Heroku is that it is extremely difficult to debug a build that is broken on Heroku but not locally. We needed a way to be able to drop into a shell in the middle of build in case of failure. Otherwise it is impossible.
Deployment:
git push heroku HEAD:master
View stdout logs:
heroku logs --tail
PostgreSQL database, it seems to be delegated to AWS. How to browse database: stackoverflow.com/questions/20410873/how-can-i-browse-my-heroku-database
heroku pg:psql
Drop and recreate database:
heroku pg:reset --confirm <app-name>
All tables are destroyed.
Restart app:
heroku restart
How to decide if an ORM is good? Updated +Created
How to decide if an ORM is decent? Just try to replicate every SQL query from nodejs/sequelize/raw/many_to_many.js on PostgreSQL and SQLite.
There is only a very finite number of possible reasonable queries on a two table many to many relationship with a join table. A decent ORM has to be able to do them all.
If it can do all those queries, then the ORM can actually do a good subset of SQL and is decent. If not, it can't, and this will make you suffer. E.g. Sequelize v5 is such an ORM that makes you suffer.
The next thing to check are transactions.
Basically, all of those come up if you try to implement a blog hello world world such as gothinkster/realworld correctly, i.e. without unnecessary inefficiencies due to your ORM on top of underlying SQL, and dealing with concurrency.
ISO SQL TRIGGER syntax Updated +Created
TODO what is the standard compliant syntax?
SQLite does not support SQL stored procedures at all, so maybe that's why they can't be standard compliant here: stackoverflow.com/questions/3335162/creating-stored-procedure-in-sqlite
SQL:1999 11.38 covers "Trigger definition". The Abstract syntax tree starts with the CREATE TRIGGER and ends in:
<triggered SQL statement> ::=
  <SQL procedure statement>
This is defined at 13.5 "SQL procedure statement", but that is humongous and I'm not sure what it is at all.
Oracle Corporation Updated +Created
Evil company that desecrated the beauty created by Sun Microsystems, and was trying to bury Java once and or all in the 2010's.
Their database is already matched by open source e.g. PostgreSQL, and ERP and CRM specific systems are boring.
Oracle basically grew out of selling one of the first SQL implementations in the late 70's, and notably to the United States Government and particularly the CIA. They did deliver a lot of value in those early pre-internet days, but now open source is and will supplant them entirely.
Sequelize example Updated +Created
To run examples on a specific database:
  • ./index.js or ./index.js l: SQLite
  • ./index.js p: PostgreSQL. You must manually create a database called tmp and ensure that peer authentication works for it
All examples can be tested on all databases with:
cd sequelize
./test
Overview of the examples:
cirosantilli.com content uploaded to ourbigbook.com/cirosantilli Updated +Created
Managed to upload the content from the static website cirosantilli.com (OurBigBook Markup source at github.com/cirosantilli/cirosantilli.github.io) to ourbigbook.com/cirosantilli.
Although most of the key requirements were already in place since the last update, as usual doing things with the complex reference content stresses the system further and leads to the exposition of several new bugs.
The upload of OurBigBook Markup files to ourbigbook.com was done with the newly added OurBigBook CLI ourbigbook --web option. Although fully exposed to end users, the setup is not super efficient: a trully decent implementation should only upload changed files, and would basically mean reimplementing/using Git, since version diffing is what Git shines at. But I've decided not to put much emphasis on CLI upload for now, since it is expected that initially the majority of users will use the Web UI only. The functionality was added primarily to upload the reference content.
This is a major milestone, as the new content can start attracting new users, and makes the purpose of the website much clearer. Just having this more realistic content also immediately highlighted what the next development steps need to be.
Once v1.0 is reached, I will actually make all internal links of cirosantilli.com to point to ourbigbook.com/cirosantilli to try and drive some more traffic.
The new content blows up by far the limit of the free Heroku PostgreSQL database of 10k lines. This meant that I needed to upgrade the Heroku Postgres plugin from the free Hobby Dev to the 9 USD/month Hobby Basic: elements.heroku.com/addons/heroku-postgresql, so now hosting costs will increase from 7 USD/month for the dyno to 7 + 9 = 16 UDS/month. After this upgrade and uploading all of cirosantilli.com to ourbigbook.com, Heroku dashboard reads reads:
  • 30,918 rows out of 10,000,000
  • 61.0 MB (out of 10 GB)
so clearly if we are ever forced to upgrade plans again, it means that a bunch of people are using the website and that things are going very very well! Happy how this storage cost turned out so far.
One key limitation found was that Heroku RAM memory is quite limited at 512MB, and JavaScript is not exactly the most memory economical language out there. Started investigation at: github.com/ourbigbook/ourbigbook/issues/230 Initially working around that by simply splitting the largest files. We were just on the verge of what could be ran however luckily, so a few dozen splits was enough, it managed to handle 70 kB OurBigBook Markup inputs. So hopefully if we manage to optimize a bit more we will be able to set a maximum size of 100 kB and still have a good safety margin.