DELETE with JOIN (SQL) Updated +Created
NO way in the SQL standard apparently, but you'd hope that implementation status would be similar to UPDATE with JOIN, but not even!
In this example, posts have tags. When a post is deleted, we check to see if there are now any empty tags, and now we want to delete any empty tags that the post deletion may have created.
If we are creating and deleting posts concurrently, a naive implementation might wrongly delete the tags of a newly created post.
This could be due to a concurrency issue of the following types.
Failure case 1:
  • thread 2: delete old post
  • thread 2: find all tags with 0 posts. Finds tag0 from the deleted old post which is now empty.
  • thread 1: create new post, which we want to have tag tag0
  • thread 1: try to create a new tag tag0, but don't because it already exists, this is done using SQLite's INSERT OR IGNORE INTO or PostgreSQL's INSERT ... ON CONFLICT DO NOTHING
  • thread 1: assign tag0 to the new post by adding an entry to the join table
  • thread 2: delete all tags with 0 posts. It still sees from its previous search that tag0 is empty, and deletes it, which then cascades into the join table
which would result in the new post incorrectly not having the tag0.
Failure case 2:
  • thread 2: delete old post
  • thread 2: find all tags with 0 posts
  • thread 1: create new post
  • thread 1: try to create a new tag tag0, but don't because it already exists
  • thread 2: delete all tags with 0 posts. It still sees from its previous search that tag0 is empty, and deletes it
  • thread 1: assign tag0 to the new post
which leads to a foreign key failure, because the tag does not exist anymore when the assignment happens.
Failure case 3:
  • thread 2: delete old post
  • thread 1: create new post, which we want to have tag tag0
  • thread 1: try to create a new tag tag0, and succeed because it wasn't present
  • thread 2: find all tags with 0 posts, finds the tag that was just created
  • thread 2: delete all tags with 0 posts, deleting the new tag
  • thread 1: assign tag0 to the new post
which leads to a foreign key failure, because the tag does not exist anymore when the assignment happens.
Sample executions:
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'READ COMMITTED': PostgreSQL, 9 tags, DELETE/CREATE the tag0 test tag 1000 times, use READ COMMITTED
    Execution often fails, although not always. The failure is always:
    error: insert or update on table "PostTag" violates foreign key constraint "PostTag_tagId_fkey"
    because the:
    INSERT INTO "PostTag"
    tries to insert a tag that was deleted in the other thread, as it didn't have any corresponding posts, so this is the foreign key failure.
    TODO: we've never managed to observe the failure case in which tag0 is deleted. Is it truly possible? And if not, by which guarantee?
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'READ COMMITTED' 'FOR UPDATE': do a SELECT ... FOR UPDATE before trying to INSERT.
    This is likely correct and the fastest correct method according to our quick benchmarking, about 20% faster than REPEATABLE READ.
    We are just now 100% sure it is corret becase we can't find out if the SELECT in the DELETE subquery could first select some rows, which are then locked by the tag creator, and only then locked by DELETE after selection. Or does it re-evaludate the SELECT even though it is in a subquery?
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'REPEATABLE READ': repeatable read
    We've never observed any failures with this level. This should likely fix the foreign key issue according to the PostgreSQL docs, since:
    • the DELETE "Post" commit cannot start to be seen only in the middle of the thread 1 transaction
    • and then if DELETE happened, the thread 1 transaction will detect it, ROLLBACK, and re-run. TODO how does it detect the need rollback? Is it because of the foreign key? It is very hard to be sure about this kind of thing, just can't find the information. Related: postgreSQL serialization failure.
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'SERIALIZABLE': serializable
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'NONE': magic value, don't use any transaction. Can blow up of course, since even less restrictions than READ COMMITTED
All executions use 2 threads.
Some theoretical notes:
  • Failure case 3 is averted by a READ COMMITTED transaction, because thread 2 won't see the uncommitted tag that thread 1 created, and therefore won't be able to delete it
stackoverflow.com/questions/10935850/when-to-use-select-for-update from SELECT FOR UPDATE also talks about a similar example, and has relevant answers.
Get Bitcoin transaction id from position in dat file Updated +Created
Suppose we specify:
  • a .dat file
  • the offset in bytes within that file
The question then is, which transaction is encoded at that position of the file?
This would allow us to index inscriptions in the .dat files directly with fast C tools, and then retrive the transaction ID to get cleaner data and metadata.
It should be possible if we managed to take the information from bitcoindev.network/understanding-the-data/ and dump into an indexed SQLite database.
I tried to start things off with LevelDBDumper:
LevelDBDumper -d ~/snap/bitcoin-core/common/.bitcoin/indexes/txindex -f btc.csv -q -o . -t csv
but that consumed all 64 GB of RAM on P51... github.com/mdawsonuk/LevelDBDumper/issues/15
But OK, nevermind that repo, it can be done easily with the LevelDB API of any language: bitcoin.stackexchange.com/questions/121888/what-is-the-data-format-layout-for-txindex-leveldb-values. Just the data seems wrong and we don't know why.
How to decide if an ORM is good? Updated +Created
How to decide if an ORM is decent? Just try to replicate every SQL query from nodejs/sequelize/raw/many_to_many.js on PostgreSQL and SQLite.
There is only a very finite number of possible reasonable queries on a two table many to many relationship with a join table. A decent ORM has to be able to do them all.
If it can do all those queries, then the ORM can actually do a good subset of SQL and is decent. If not, it can't, and this will make you suffer. E.g. Sequelize v5 is such an ORM that makes you suffer.
The next thing to check are transactions.
Basically, all of those come up if you try to implement a blog hello world world such as gothinkster/realworld correctly, i.e. without unnecessary inefficiencies due to your ORM on top of underlying SQL, and dealing with concurrency.
ISO SQL TRIGGER syntax Updated +Created
TODO what is the standard compliant syntax?
SQLite does not support SQL stored procedures at all, so maybe that's why they can't be standard compliant here: stackoverflow.com/questions/3335162/creating-stored-procedure-in-sqlite
SQL:1999 11.38 covers "Trigger definition". The Abstract syntax tree starts with the CREATE TRIGGER and ends in:
<triggered SQL statement> ::=
  <SQL procedure statement>
This is defined at 13.5 "SQL procedure statement", but that is humongous and I'm not sure what it is at all.
LevelDB Updated +Created
One "LevelDB" database contains multiple file in a directory. Off the bat inferior to SQLite which stores everything in a single file!
lujakob/nestjs-realworld-example-app SQLite port Updated +Created
Tried a quick port to SQLite to get rid of annoying local databases for development, but failed, at c1c2cc4e448b279ff083272df1ac50d20c3304fa
npm install sqlite3 --save-dev
and
{
  "type": "sqlite",
  "database": "db.sqlite3",
  "entities": ["src/**/**.entity{.ts,.js}"],
  "synchronize": true
}
then:
npm start
fails with:
DataTypeNotSupportedError: Data type "timestamp" in "ArticleEntity.created" is not supported by "sqlite" database.
Attempt to hack it:
--- a/src/article/article.entity.ts
+++ b/src/article/article.entity.ts
@@ -20,10 +20,10 @@ export class ArticleEntity {
   @Column({default: ''})
   body: string;

-  @Column({ type: 'timestamp', default: () => "CURRENT_TIMESTAMP"})
+  @Column({ default: () => "CURRENT_TIMESTAMP"})
   created: Date;

-  @Column({ type: 'timestamp', default: () => "CURRENT_TIMESTAMP"})
+  @Column({ default: () => "CURRENT_TIMESTAMP"})
   updated: Date;
and after that it seems to run.
I can signup and login, terrible error reporting as usual, make sure to use long enough usernames/passwords.
However, article creation fails with:
Unhandled Rejection (TypeError): Cannot read property 'slug' of undefined
Sequelize example Updated +Created
To run examples on a specific database:
  • ./index.js or ./index.js l: SQLite
  • ./index.js p: PostgreSQL. You must manually create a database called tmp and ensure that peer authentication works for it
All examples can be tested on all databases with:
cd sequelize
./test
Overview of the examples:
SQL 2D histogram Updated +Created
Let's try it on SQLite 3.40.1, Ubuntu 23.04. Data setup:
sqlite3 tmp.sqlite 'create table t(x integer, y integer)'
sqlite3 tmp.sqlite <<EOF
insert into t values
  (0, 0),
  (1, 1),
  (2, 2),
  (3, 3),
  (4, 4),
  (5, 5),
  (6, 6),
  (7, 7),
  (8, 8),
  (9, 9),
  (10, 10),
  (11, 11),
  (12, 12),
  (13, 13),
  (14, 14),
  (15, 15),
  (16, 16),
  (17, 17),
  (18, 18),
  (19, 19),

  (2, 18)
EOF
sqlite3 tmp.sqlite 'create index txy on t(x, y)'
For a bin size of 5 ignoring empty ranges we can:
sqlite3 tmp.sqlite <<EOF
select
  floor(x/5)*5 as x,
  floor(y/5)*5 as y,
  count(*) as cnt
from t
group by 1, 2
order by 1, 2
EOF
which produces the desired:
0|0|5
0|15|1
5|5|5
10|10|5
15|15|5
And to consider empty ranges we can use SQL genenerate_series + as per stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql:
sqlite3 tmp.sqlite <<EOF
select x, y, sum(cnt) from (
  select
      floor(x/5)*5 as x,
      floor(y/5)*5 as y,
      count(*) as cnt
    from t
    group by 1, 2
  union
  select *, 0 as cnt from generate_series(0, 15, 5) inner join (select * from generate_series(0, 15, 5))
)
group by x, y
EOF
which outputs the desired:
0|0|5
0|5|0
0|10|0
0|15|1
5|0|0
5|5|5
5|10|0
5|15|0
10|0|0
10|5|0
10|10|5
10|15|0
15|0|0
15|5|0
15|10|0
15|15|5
SQL contiguous ranges Updated +Created
stackoverflow.com/questions/17046204/how-to-find-the-boundaries-of-groups-of-contiguous-sequential-numbers/17046749#17046749 just works, even in SQLite which supports all quoting types known to man including [] for compatibility with insane RDBMSs!
Here's a slightly saner version:
rm -f tmp.sqlite
sqlite3 tmp.sqlite "create table mytable (id integer primary key autoincrement, number integer, status integer)"
sqlite3 tmp.sqlite <<EOF
insert into mytable(number, status) values
  (100,0),
  (101,0),
  (102,0),
  (103,0),
  (104,1),
  (105,1),
  (106,0),
  (107,0),
  (1014,0),
  (1015,0),
  (1016,1),
  (1017,0)
EOF
sqlite3 tmp.sqlite <<EOF
SELECT
  MIN(id) AS "id",
  MIN(number) AS "from",
  MAX(number) AS "to"
FROM (
  SELECT ROW_NUMBER() OVER (ORDER BY number) - number AS grp, id, number
  FROM mytable
  WHERE status = 0
)
GROUP BY grp
ORDER BY MIN(number)
EOF
output:
1|100|103
7|106|107
9|1014|1015
12|1017|1017
To get only groups of length greater than 1:
sqlite3 tmp.sqlite <<EOF
SELECT "id", "from", "to", "to" - "from" + 1 as "len" FROM (
  SELECT
    MIN("id") AS "id",
    MIN(number) AS "from",
    MAX(number) AS "to"
  FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY "number") - "number" AS "grp", "id", "number"
    FROM "mytable"
    WHERE "status" = 0
  )
  GROUP BY "grp"
  ORDER BY MIN("number")
) WHERE "len" > 1
EOF
Output:
1|100|103|4
7|106|107|2
9|1014|1015|2
SQL example Updated +Created
We have some runnable SQL examples with assertion under the sequelize/raw directory.
These examples are written in the Sequelize library using raw queries.
Sequelize is used minimally, just to feed raw queries in transparently to any underlying database, and get minimally parsed results out for us, which we then assert with standard JavaScript. The queries themselves are all written by hand.
By default the examples run on SQLite. Just like the examples from sequelize example, you can set the database at runtime as:
  • ./index.js or ./index.js l: SQLite
  • ./index.js p: PostgreSQL. You must manually create a database called tmp and ensure that peer authentication works for it
Here we list only examples which we believe are standard SQL, and should therefore work across different SQL implementations:
SQL histogram Updated +Created
OK, there's a billion questions:
Let's try it on SQLite 3.40.1, Ubuntu 23.04. Data setup:
sqlite3 tmp.sqlite 'create table t(x integer)'
sqlite3 tmp.sqlite <<EOF
insert into t values (
  0,
  2,
  2,
  3,

  5,
  6,
  6,
  8,
  9,

  17,
)
EOF
sqlite3 tmp.sqlite 'create index tx on t(x)'
For a bin size of 5 ignoring empty ranges we can:
sqlite3 tmp.sqlite <<EOF
select floor(x/5)*5 as x,
       count(*) as cnt
from t
group by 1
order by 1
EOF
which produces the desired:
0|4
5|5
15|1
And to consider empty ranges we can use SQL genenerate_series + as per stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql:
sqlite3 tmp.sqlite <<EOF
select x, sum(cnt) from (
  select floor(x/5)*5 as x,
         count(*) as cnt
    from t
    group by 1
  union
  select *, 0 as cnt from generate_series(0, 15, 5)
)
group by x
EOF
which outputs the desired:
0|4
5|5
10|0
15|1