- SQLite with
rowid
: stackoverflow.com/questions/8190541/deleting-duplicate-rows-from-sqlite-database - SQL Server has crazy "CTEs" change backing table extension: stackoverflow.com/questions/18390574/how-to-delete-duplicate-rows-in-sql-server
Demo under: nodejs/sequelize/raw/many_to_many.js.
NO way in the SQL standard apparently, but you'd hope that implementation status would be similar to UPDATE with JOIN, but not even!
- PostgreSQL: possible with
DELETE FROM USING
: stackoverflow.com/questions/11753904/postgresql-delete-with-inner-join - SQLite: not possible without subqueries as of 3.35 far: stackoverflow.com/questions/24511153/how-delete-table-inner-join-with-other-table-in-sqlite, Does not appear to have any relevant features at: www.sqlite.org/lang_delete.html
ORM
- Sequelize: no support of course: stackoverflow.com/questions/40890131/sequelize-destroy-record-with-join
@cirosantilli/_file/nodejs/sequelize/raw/nodejs/sequelize/raw/parallel_create_delete_empty_tag.js Updated 2024-12-15 +Created 1970-01-01
In this example, posts have tags. When a post is deleted, we check to see if there are now any empty tags, and now we want to delete any empty tags that the post deletion may have created.
If we are creating and deleting posts concurrently, a naive implementation might wrongly delete the tags of a newly created post.
This could be due to a concurrency issue of the following types.
Failure case 1:which would result in the new post incorrectly not having the
- thread 2: delete old post
- thread 2: find all tags with 0 posts. Finds
tag0
from the deleted old post which is now empty. - thread 1: create new post, which we want to have tag
tag0
- thread 1: try to create a new tag
tag0
, but don't because it already exists, this is done using SQLite'sINSERT OR IGNORE INTO
or PostgreSQL'sINSERT ... ON CONFLICT DO NOTHING
- thread 1: assign
tag0
to the new post by adding an entry to the join table - thread 2: delete all tags with 0 posts. It still sees from its previous search that
tag0
is empty, and deletes it, which then cascades into the join table
tag0
.Failure case 2:which leads to a foreign key failure, because the tag does not exist anymore when the assignment happens.
- thread 2: delete old post
- thread 2: find all tags with 0 posts
- thread 1: create new post
- thread 1: try to create a new tag
tag0
, but don't because it already exists - thread 2: delete all tags with 0 posts. It still sees from its previous search that
tag0
is empty, and deletes it - thread 1: assign
tag0
to the new post
Failure case 3:which leads to a foreign key failure, because the tag does not exist anymore when the assignment happens.
- thread 2: delete old post
- thread 1: create new post, which we want to have tag
tag0
- thread 1: try to create a new tag
tag0
, and succeed because it wasn't present - thread 2: find all tags with 0 posts, finds the tag that was just created
- thread 2: delete all tags with 0 posts, deleting the new tag
- thread 1: assign
tag0
to the new post
Sample executions:All executions use 2 threads.
node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'READ COMMITTED'
: PostgreSQL, 9 tags, DELETE/CREATE thetag0
test tag 1000 times, useREAD COMMITTED
Execution often fails, although not always. The failure is always:because the:error: insert or update on table "PostTag" violates foreign key constraint "PostTag_tagId_fkey"
tries to insert a tag that was deleted in the other thread, as it didn't have any corresponding posts, so this is the foreign key failure.INSERT INTO "PostTag"
TODO: we've never managed to observe the failure case in whichtag0
is deleted. Is it truly possible? And if not, by which guarantee?node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'READ COMMITTED' 'FOR UPDATE'
: do aSELECT ... FOR UPDATE
before trying toINSERT
.This is likely correct and the fastest correct method according to our quick benchmarking, about 20% faster thanREPEATABLE READ
.We are just now 100% sure it is corret becase we can't find out if theSELECT
in theDELETE
subquery could first select some rows, which are then locked by the tag creator, and only then locked byDELETE
after selection. Or does it re-evaludate theSELECT
even though it is in a subquery?node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'REPEATABLE READ'
: repeatable readWe've never observed any failures with this level. This should likely fix the foreign key issue according to the PostgreSQL docs, since:- the
DELETE "Post"
commit cannot start to be seen only in the middle of the thread 1 transaction - and then if DELETE happened, the thread 1 transaction will detect it, ROLLBACK, and re-run. TODO how does it detect the need rollback? Is it because of the foreign key? It is very hard to be sure about this kind of thing, just can't find the information. Related: postgreSQL serialization failure.
- the
node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'SERIALIZABLE'
: serializablenode --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'NONE'
: magic value, don't use any transaction. Can blow up of course, since even less restrictions thanREAD COMMITTED
Some theoretical notes:
- Failure case 3 is averted by a
READ COMMITTED
transaction, because thread 2 won't see the uncommitted tag that thread 1 created, and therefore won't be able to delete it
stackoverflow.com/questions/10935850/when-to-use-select-for-update from SELECT FOR UPDATE also talks about a similar example, and has relevant answers.
Suppose we specify:The question then is, which transaction is encoded at that position of the file?
- a .dat file
- the offset in bytes within that file
This would allow us to index inscriptions in the .dat files directly with fast C tools, and then retrive the transaction ID to get cleaner data and metadata.
It should be possible if we managed to take the information from bitcoindev.network/understanding-the-data/ and dump into an indexed SQLite database.
I tried to start things off with LevelDBDumper:but that consumed all 64 GB of RAM on P51... github.com/mdawsonuk/LevelDBDumper/issues/15
LevelDBDumper -d ~/snap/bitcoin-core/common/.bitcoin/indexes/txindex -f btc.csv -q -o . -t csv
But OK, nevermind that repo, it can be done easily with the LevelDB API of any language: bitcoin.stackexchange.com/questions/121888/what-is-the-data-format-layout-for-txindex-leveldb-values. Just the data seems wrong and we don't know why.
- kimchi
- reverse debugging
- E Ink
- web archiving
- Buildroot
- integrated development environments
- degreaser
- UML: while it might seem like a over-thought thing and likely is, the basic idea that understanding "one to one vs one to many vs many to many" relationships between objects and which object can see which object, is a fantastic approach towards understanding complex object oriented code
- open source software, including open source scientific computing consultancies
- computer
- FOSDEM. Ciro Santilli attended in 2016, and felt extremely good together with all those amazingly smart open source hackers: www.quora.com/What-are-the-best-open-source-conferences/answer/Ciro-Santilli
- Sass
- vimium
- bisection
- vector graphics, notably scalable Vector Graphics
- ASCII art
- OAuth
- command-line interface
- virtualization
- Anusol
- autodidacticism and self-directed learning
- end-to-end encryption
- The Criterion Collection
- version control
- SQLite
- Guerrilla Mail
- POSIX
- static website
- Freeman Dyson
- open access academic publishers
- unconditional basic income
- transhumanism
- 2FA, and notably 2FA apps
- human-readable formats
- wealth tax
- Reproducible builds
- F-Droid
- Can't get you out of my head by Adam Curtis (2021)
- drug liberalization
- Wiki-binge
- molecular Sciences Course of the University of São Paulo
- meal deal
- clade, as opposed to taxonomic ranks
- lingua franca, see also: having more than one natural language is bad for the world
- rsync
- zip hoodies
How to decide if an ORM is decent? Just try to replicate every SQL query from nodejs/sequelize/raw/many_to_many.js on PostgreSQL and SQLite.
There is only a very finite number of possible reasonable queries on a two table many to many relationship with a join table. A decent ORM has to be able to do them all.
If it can do all those queries, then the ORM can actually do a good subset of SQL and is decent. If not, it can't, and this will make you suffer. E.g. Sequelize v5 is such an ORM that makes you suffer.
The next thing to check are transactions.
Basically, all of those come up if you try to implement a blog hello world world such as gothinkster/realworld correctly, i.e. without unnecessary inefficiencies due to your ORM on top of underlying SQL, and dealing with concurrency.
TODO what is the standard compliant syntax?
PostgreSQL requires you to define a SQL stored procedure: stackoverflow.com/questions/28149494/is-it-possible-to-create-trigger-without-execute-procedure-in-postgresql Their syntax may be standard compliant, not sure about the
EXECUTE
part. Their docs: www.postgresql.org/docs/current/sql-createtrigger.htmlSQLite does not support SQL stored procedures at all, so maybe that's why they can't be standard compliant here: stackoverflow.com/questions/3335162/creating-stored-procedure-in-sqlite
SQL:1999 11.38 covers "Trigger definition". The Abstract syntax tree starts with the
CREATE TRIGGER
and ends in:<triggered SQL statement> ::=
<SQL procedure statement>
This is defined at 13.5 "SQL procedure statement", but that is humongous and I'm not sure what it is at all.
One "LevelDB" database contains multiple file in a directory. Off the bat inferior to SQLite which stores everything in a single file!
Tried a quick port to SQLite to get rid of annoying local databases for development, but failed, at c1c2cc4e448b279ff083272df1ac50d20c3304faandthen:fails with:Attempt to hack it:and after that it seems to run.
npm install sqlite3 --save-dev
{
"type": "sqlite",
"database": "db.sqlite3",
"entities": ["src/**/**.entity{.ts,.js}"],
"synchronize": true
}
npm start
DataTypeNotSupportedError: Data type "timestamp" in "ArticleEntity.created" is not supported by "sqlite" database.
--- a/src/article/article.entity.ts
+++ b/src/article/article.entity.ts
@@ -20,10 +20,10 @@ export class ArticleEntity {
@Column({default: ''})
body: string;
- @Column({ type: 'timestamp', default: () => "CURRENT_TIMESTAMP"})
+ @Column({ default: () => "CURRENT_TIMESTAMP"})
created: Date;
- @Column({ type: 'timestamp', default: () => "CURRENT_TIMESTAMP"})
+ @Column({ default: () => "CURRENT_TIMESTAMP"})
updated: Date;
I can signup and login, terrible error reporting as usual, make sure to use long enough usernames/passwords.
However, article creation fails with:
Unhandled Rejection (TypeError): Cannot read property 'slug' of undefined
To run examples on a specific database:All examples can be tested on all databases with:
./index.js
or./index.js l
: SQLite./index.js p
: PostgreSQL. You must manually create a database calledtmp
and ensure that peer authentication works for it
cd sequelize
./test
Overview of the examples:
- nodejs/sequelize/index.js: a bunch of basic examples
- nodejs/sequelize/update.js: This file is also where we are storing our expression-foo for now, e.g. how to do stuff like
col1 + col2
. Such knowledge can however be used basically anywhere else however, e.g. inAS
orWHERE
clauses, not just inUPDATE
. - nodejs/sequelize/count.js: a simplified single-table count example. In practice, this will be usually done together with JOIN queries across multiple tables. Answers: stackoverflow.com/questions/22627258/how-does-group-by-works-in-sequelize/69896449#69896449
- nodejs/sequelize/date.js: automatic date typecasts
- nodejs/sequelize/like.js: LIKE
- nodejs/sequelize/camel_case.js: trying to get everything in the database camel cased, columns starting with lowercase, and tables starting with uppercase. The defaults documented on getting started documentation do uppercase foreign keys, and lowercase non-foreign keys. It's a mess.
- nodejs/sequelize/ignore_duplicates.js: ignore query on unique violation with
ignoreDuplicates: true
which does SQLiteINSERT OR IGNORE INTO
or PostgreSQLON CONFLICT DO NOTHING
. Closely related Upsert versions:- Upsert
- nodejs/sequelize/upsert.js:
.upsert
selects the conflict column automatically by unique columns. Both SQLite and PostgreSQL doINSERT INTO ON CONFLICT
. PostgreSQL usesRETURNING
, which was added too recently to SQLite: www.sqlite.org/lang_returning.htmlAt nodejs/sequelize/composite_index.js we have a.upsert
test with a composite index. Works just fine as you'd expect with a compositeON CONFLICT
. Well done. - nodejs/sequelize/update_on_duplicate.js:
.bulkCreate({}, { updateOnDuplicate: ['col1', col2'] }
. Produces queries analogous to.upsert
. This method is cool because it can upsert multiple columns at once. But it is annoying that you have to specify all fields to be updated one by one manually. - stackoverflow.com/questions/29063232/how-to-get-the-id-of-an-inserted-or-updated-record-in-sequelize-upsert/72092277#72092277
- stackoverflow.com/questions/55531860/sequelize-bulkcreate-updateonduplicate-for-postgresql
- Upsert
- nodejs/sequelize/inc.js: demonstrate the
increment
method. In SQLite, it produces a statement of type:UPDATE `IntegerNames` SET `value`=`value`+ 1,`updatedAt`='2021-11-03 10:23:45.409 +00:00' WHERE `id` = 3
- nodejs/sequelize/sync_alter.js: illustrates
Model.sync({alter: true})
to modify a table definition, answers: stackoverflow.com/questions/54898994/bulkupdate-in-sequelize-orm/69044138#69044138 - nodejs/sequelize/truncate_key.js
- nodejs/sequelize/validation.js: is handled by a third-party library: github.com/validatorjs/validator.js. They then add a few extra validators on top of that.The
args: true
thing is explained at: stackoverflow.com/questions/58522387/unhandled-rejection-sequelizevalidationerror-validation-error-cannot-create-pr/70263032#70263032 - nodejs/sequelize/composite_index.js: stackoverflow.com/questions/34664853/sequelize-composite-unique-constraint
- nodejs/sequelize/indent_log.js: stackoverflow.com/questions/34664853/sequelize-composite-unique-constraint
- association examples:
- nodejs/sequelize/one_to_many.js: basic one-to-many examples.
- nodejs/sequelize/many_to_many.js: basic many-to-many examples, each user can like multiple posts. Answers: stackoverflow.com/questions/22958683/how-to-implement-many-to-many-association-in-sequelize/67973948#67973948
- ORDER BY include:
- nodejs/sequelize/many_to_many_custom_table.js: many-to-many example, but where we craft our own table which can hold extra data. In our case, users can like posts, but likes have a integer weight associated with them. Related threads:
- nodejs/sequelize/many_to_many_same_model.js: association between a model and itself: users can follow other users. Related:
- nodejs/sequelize/many_to_many_same_model_super.js
- nodejs/sequelize/many_to_many_super.js: "Super many to many": sequelize.org/master/manual/advanced-many-to-many.html This should not exist and shows how bad this library is for associations, you need all that boilerplate in order to expose certain relationships that aren't otherwise exposed by a direct
hasMany
with implicit join table.
- nested includes to produce queries with multiple JOIN:
- nodejs/sequelize/nested_include.js: find all posts by users that a given user follows. Answers: stackoverflow.com/questions/42632943/sequelize-multiple-where-clause/68018083#68018083
- nodejs/sequelize/nested_include_super.js: like nodejs/sequelize/nested_include.js but with a super many to many. We should move this to nodejs/sequelize/many_to_many_super.js.
- two relationships between two specific tables: we need to use
as:
to disambiguate them- nodejs/sequelize/many_to_many_double.js: users can both follow and like posts
- nodejs/sequelize/one_to_many_double.js: posts have the author and a mandatory reviewer
- hooks
- internals:
- nodejs/sequelize/common.js: common utilities used across examples, most notably:
- to easily setup different DBRM
- nodejs/sequelize/min_nocommon.js: to copy paste to Stack Overflow
- nodejs/sequelize/min.js: template for new exapmles in the folder
- nodejs/sequelize/common.js: common utilities used across examples, most notably:
Let's try it on SQLite 3.40.1, Ubuntu 23.04. Data setup:
sqlite3 tmp.sqlite 'create table t(x integer, y integer)'
sqlite3 tmp.sqlite <<EOF
insert into t values
(0, 0),
(1, 1),
(2, 2),
(3, 3),
(4, 4),
(5, 5),
(6, 6),
(7, 7),
(8, 8),
(9, 9),
(10, 10),
(11, 11),
(12, 12),
(13, 13),
(14, 14),
(15, 15),
(16, 16),
(17, 17),
(18, 18),
(19, 19),
(2, 18)
EOF
sqlite3 tmp.sqlite 'create index txy on t(x, y)'
For a bin size of 5 ignoring empty ranges we can:which produces the desired:
sqlite3 tmp.sqlite <<EOF
select
floor(x/5)*5 as x,
floor(y/5)*5 as y,
count(*) as cnt
from t
group by 1, 2
order by 1, 2
EOF
0|0|5
0|15|1
5|5|5
10|10|5
15|15|5
And to consider empty ranges we can use SQL which outputs the desired:
genenerate_series
+ as per stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql:sqlite3 tmp.sqlite <<EOF
select x, y, sum(cnt) from (
select
floor(x/5)*5 as x,
floor(y/5)*5 as y,
count(*) as cnt
from t
group by 1, 2
union
select *, 0 as cnt from generate_series(0, 15, 5) inner join (select * from generate_series(0, 15, 5))
)
group by x, y
EOF
0|0|5
0|5|0
0|10|0
0|15|1
5|0|0
5|5|5
5|10|0
5|15|0
10|0|0
10|5|0
10|10|5
10|15|0
15|0|0
15|5|0
15|10|0
15|15|5
stackoverflow.com/questions/17046204/how-to-find-the-boundaries-of-groups-of-contiguous-sequential-numbers/17046749#17046749 just works, even in SQLite which supports all quoting types known to man including
[]
for compatibility with insane RDBMSs!Here's a slightly saner version:
rm -f tmp.sqlite
sqlite3 tmp.sqlite "create table mytable (id integer primary key autoincrement, number integer, status integer)"
sqlite3 tmp.sqlite <<EOF
insert into mytable(number, status) values
(100,0),
(101,0),
(102,0),
(103,0),
(104,1),
(105,1),
(106,0),
(107,0),
(1014,0),
(1015,0),
(1016,1),
(1017,0)
EOF
sqlite3 tmp.sqlite <<EOF
SELECT
MIN(id) AS "id",
MIN(number) AS "from",
MAX(number) AS "to"
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY number) - number AS grp, id, number
FROM mytable
WHERE status = 0
)
GROUP BY grp
ORDER BY MIN(number)
EOF
output:
1|100|103
7|106|107
9|1014|1015
12|1017|1017
To get only groups of length greater than 1:
sqlite3 tmp.sqlite <<EOF
SELECT "id", "from", "to", "to" - "from" + 1 as "len" FROM (
SELECT
MIN("id") AS "id",
MIN(number) AS "from",
MAX(number) AS "to"
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY "number") - "number" AS "grp", "id", "number"
FROM "mytable"
WHERE "status" = 0
)
GROUP BY "grp"
ORDER BY MIN("number")
) WHERE "len" > 1
EOF
Output:
1|100|103|4
7|106|107|2
9|1014|1015|2
These examples are written in the Sequelize library using raw queries.
Sequelize is used minimally, just to feed raw queries in transparently to any underlying database, and get minimally parsed results out for us, which we then assert with standard JavaScript. The queries themselves are all written by hand.
By default the examples run on SQLite. Just like the examples from sequelize example, you can set the database at runtime as:
./index.js
or./index.js l
: SQLite./index.js p
: PostgreSQL. You must manually create a database calledtmp
and ensure that peer authentication works for it
Here we list only examples which we believe are standard SQL, and should therefore work across different SQL implementations:
- nodejs/sequelize/raw/index.js: basic hello world to demonstrate the setup and very simple functionality
- nodejs/sequelize/raw/many_to_many.js: illustrates many-to-many relations with JOIN. Contains:
- SQL transaction examples:
- nodejs/sequelize/raw/commit_error.js: stackoverflow.com/questions/27245101/why-should-we-use-rollback-in-sql-explicitly/27245234#27245234 and stackoverflow.com/questions/48277519/how-to-use-commit-and-rollback-in-a-postgresql-function/48277708#48277708 suggest that on PostgreSQL, once something fails inside a transaction, all queries in the current transaction are ignored, and
COMMIT
simply does aROLLBACK
. Let's check. Yup, true for Postgres, but false for SQLite, SQLite just happily runs anything it can, you really needROLLBACK
for it. - SQL isolation level example
- nodejs/sequelize/raw/commit_error.js: stackoverflow.com/questions/27245101/why-should-we-use-rollback-in-sql-explicitly/27245234#27245234 and stackoverflow.com/questions/48277519/how-to-use-commit-and-rollback-in-a-postgresql-function/48277708#48277708 suggest that on PostgreSQL, once something fails inside a transaction, all queries in the current transaction are ignored, and
- GROUP BY and SQL aggregate functions:
- nodejs/sequelize/raw/group_by_extra_column.js: let's see if it blows up or not on different DB systems,
sqlite3
Node.js package allows it:- github.com/sequelize/sequelize/issues/5481#issuecomment-964387232
- dba.stackexchange.com/questions/141594/how-select-column-does-not-list-in-group-by-clause/141600 says that it was allowed in SQL:1999 when there are no ambiguities due to constraints, e.g. when grouping by unique columns
- github.com/postgres/postgres/blob/REL_13_5/src/test/regress/sql/functional_deps.sql#L27 shows that PostgreSQL wants it to work for
UNIQUE NOT NULL
, but they just haven't implemented it as of 13.5, where it only works if you group byPRIMARY KEY
- dba.stackexchange.com/questions/158015/why-can-i-select-all-fields-when-grouping-by-primary-key-but-not-when-grouping-b also says that
UNIQUE NOT NULL
doesn't work. Dan Lenski then points to rationale mailing list thread:
- nodejs/sequelize/raw/group_by_max_full_row.js: here we try to get the full row of each group at which a given column reaches the max of the group
- Postgres: has
SELECT DISCINTCT ON
which works perfectly if you only want one row in case of multiple rows attaining the max.ON
is an extension to the standard unfortunately: www.postgresql.org/docs/9.3/sql-select.html#SQL-DISTINCT Docs specify that it always respectsORDER BY
when selecting the row.- stackoverflow.com/questions/586781/postgresql-fetch-the-row-which-has-the-max-value-for-a-column asks it without the multiple matches use case
- stackoverflow.com/questions/586781/postgresql-fetch-the-rows-which-have-the-max-value-for-a-column-in-each-group/587209#587209 also present in simpler form at stackoverflow.com/questions/121387/fetch-the-rows-which-have-the-max-value-for-a-column-for-each-distinct-value-of/123481#123481 gives a very nice OUTER JOIN only solution! Incredible, very elegant.
- dba.stackexchange.com/questions/171938/get-only-rows-with-max-group-value asks specifically the case of multiple matches to the max
- stackoverflow.com/questions/586781/postgresql-fetch-the-row-which-has-the-max-value-for-a-column asks it without the multiple matches use case
- SQLite:
- stackoverflow.com/questions/48326957/row-with-max-value-per-group-sqlite
- stackoverflow.com/questions/48326957/row-with-max-value-per-group-sqlite/48328243#48328243 teaches us that in SQLite min and max are magic and guarantee that the matching row is returned
- stackoverflow.com/questions/48326957/row-with-max-value-per-group-sqlite/72996649#72996649 Ciro Santilli uses the magic of
ROW_NUMBER
- stackoverflow.com/questions/17277152/sqlite-select-distinct-of-one-column-and-get-the-others/71924314#71924314 get any full row without specifying which, we teach how to specify
- code.djangoproject.com/ticket/22696 WONTFIXed
DISTINCT ON
- stackoverflow.com/questions/50846722/what-is-the-difference-between-postgres-distinct-vs-distinct-on/72997494#72997494
DISTINCT
vsDISTINCT ON
, somewhat related question
- stackoverflow.com/questions/50846722/what-is-the-difference-between-postgres-distinct-vs-distinct-on/72997494#72997494
- stackoverflow.com/questions/48326957/row-with-max-value-per-group-sqlite
- stackoverflow.com/questions/5803032/group-by-to-return-entire-row asks how to take the top N with distinct after order limit. I don't know how to do it in Postgres
- Postgres: has
- nodejs/sequelize/raw/most_frequent.js: illustrates a few variants of findind the mode, including across GROUP
- nodejs/sequelize/raw/group_by_max_n.js: get the top N in each group
- nodejs/sequelize/raw/group_by_extra_column.js: let's see if it blows up or not on different DB systems,
- order results in the same order as
IN
: - LIMIT by a running total: TODO links
OK, there's a billion questions:
- SQL Server
- stackoverflow.com/questions/485409/generating-a-histogram-from-column-values-in-a-database OP did not know the difference between count and histogram :-) But it's the number one Google result.
- stackoverflow.com/questions/19103991/create-range-bins-from-sql-server-table-for-histograms has a minor extra group by twist, but otherwise fine
- stackoverflow.com/questions/16268441/generate-histogram-in-sql-server
- SQLite
- stackoverflow.com/questions/67514208/how-to-optimise-creating-histogram-bins-in-sqlite perf only, benchmarking would be needed. SQLite.
- stackoverflow.com/questions/32155449/create-a-histogram-with-a-dynamic-number-of-partitions-in-sqlite variable bin size, same number of entries per bin
- stackoverflow.com/questions/60348109/histogram-for-time-periods-using-sqlite-regular-buckets-1h-wide time
- MySQL: stackoverflow.com/questions/1764881/getting-data-for-histogram-plot MySQL appears to extend
ROUND
to also round by integers:ROUND(numeric_value, -2)
, but this is not widely portable which is a shame - stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql specifically asks about empty bins, which is amazing. Amazon Redshift dialect unfortunately, but answer provided works widely, and Redshift was forked from PostgreSQL, so there's hope. Those newb open source server focused projects that don't use AGPL!
Let's try it on SQLite 3.40.1, Ubuntu 23.04. Data setup:
sqlite3 tmp.sqlite 'create table t(x integer)'
sqlite3 tmp.sqlite <<EOF
insert into t values (
0,
2,
2,
3,
5,
6,
6,
8,
9,
17,
)
EOF
sqlite3 tmp.sqlite 'create index tx on t(x)'
For a bin size of 5 ignoring empty ranges we can:which produces the desired:
sqlite3 tmp.sqlite <<EOF
select floor(x/5)*5 as x,
count(*) as cnt
from t
group by 1
order by 1
EOF
0|4
5|5
15|1
And to consider empty ranges we can use SQL which outputs the desired:
genenerate_series
+ as per stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql:sqlite3 tmp.sqlite <<EOF
select x, sum(cnt) from (
select floor(x/5)*5 as x,
count(*) as cnt
from t
group by 1
union
select *, 0 as cnt from generate_series(0, 15, 5)
)
group by x
EOF
0|4
5|5
10|0
15|1