= SQL histogram
{c}
OK, there's a billion questions:
* <SQL Server>
* https://stackoverflow.com/questions/485409/generating-a-histogram-from-column-values-in-a-database OP did not know the difference between count and histogram :-) But it's the number one Google result.
* https://stackoverflow.com/questions/19103991/create-range-bins-from-sql-server-table-for-histograms has a minor extra group by twist, but otherwise fine
* https://stackoverflow.com/questions/16268441/generate-histogram-in-sql-server
* <SQLite>
* https://stackoverflow.com/questions/67514208/how-to-optimise-creating-histogram-bins-in-sqlite perf only, benchmarking would be needed. <SQLite>.
* https://stackoverflow.com/questions/32155449/create-a-histogram-with-a-dynamic-number-of-partitions-in-sqlite variable bin size, same number of entries per bin
* https://stackoverflow.com/questions/60348109/histogram-for-time-periods-using-sqlite-regular-buckets-1h-wide time
* <MySQL>: https://stackoverflow.com/questions/1764881/getting-data-for-histogram-plot MySQL appears to extend `ROUND` to also round by integers: `ROUND(numeric_value, -2)`, but this is not widely portable which is a shame
* https://stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql specifically asks about empty bins, which is amazing. <Amazon Redshift> dialect unfortunately, but answer provided works widely, and Redshift was forked from <PostgreSQL>, so there's hope. Those newb <open source> server focused projects that don't use <AGPL>!
Let's try it on <SQLite> 3.40.1, <Ubuntu 23.04>. Data setup:
``
sqlite3 tmp.sqlite 'create table t(x integer)'
sqlite3 tmp.sqlite <<EOF
insert into t values (
0,
2,
2,
3,
5,
6,
6,
8,
9,
17,
)
EOF
sqlite3 tmp.sqlite 'create index tx on t(x)'
``
For a bin size of 5 ignoring empty ranges we can:
``
sqlite3 tmp.sqlite <<EOF
select floor(x/5)*5 as x,
count(*) as cnt
from t
group by 1
order by 1
EOF
``
which produces the desired:
``
0|4
5|5
15|1
``
And to consider empty ranges we can use <SQL genenerate_series> + as per https://stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql[]:
``
sqlite3 tmp.sqlite <<EOF
select x, sum(cnt) from (
select floor(x/5)*5 as x,
count(*) as cnt
from t
group by 1
union
select *, 0 as cnt from generate_series(0, 15, 5)
)
group by x
EOF
``
which outputs the desired:
``
0|4
5|5
10|0
15|1
``
Back to article page