cirosantilli.com content uploaded to ourbigbook.com/cirosantilli Updated +Created
Managed to upload the content from the static website cirosantilli.com (OurBigBook Markup source at github.com/cirosantilli/cirosantilli.github.io) to ourbigbook.com/cirosantilli.
Although most of the key requirements were already in place since the last update, as usual doing things with the complex reference content stresses the system further and leads to the exposition of several new bugs.
The upload of OurBigBook Markup files to ourbigbook.com was done with the newly added OurBigBook CLI ourbigbook --web option. Although fully exposed to end users, the setup is not super efficient: a trully decent implementation should only upload changed files, and would basically mean reimplementing/using Git, since version diffing is what Git shines at. But I've decided not to put much emphasis on CLI upload for now, since it is expected that initially the majority of users will use the Web UI only. The functionality was added primarily to upload the reference content.
This is a major milestone, as the new content can start attracting new users, and makes the purpose of the website much clearer. Just having this more realistic content also immediately highlighted what the next development steps need to be.
Once v1.0 is reached, I will actually make all internal links of cirosantilli.com to point to ourbigbook.com/cirosantilli to try and drive some more traffic.
The new content blows up by far the limit of the free Heroku PostgreSQL database of 10k lines. This meant that I needed to upgrade the Heroku Postgres plugin from the free Hobby Dev to the 9 USD/month Hobby Basic: elements.heroku.com/addons/heroku-postgresql, so now hosting costs will increase from 7 USD/month for the dyno to 7 + 9 = 16 UDS/month. After this upgrade and uploading all of cirosantilli.com to ourbigbook.com, Heroku dashboard reads reads:
  • 30,918 rows out of 10,000,000
  • 61.0 MB (out of 10 GB)
so clearly if we are ever forced to upgrade plans again, it means that a bunch of people are using the website and that things are going very very well! Happy how this storage cost turned out so far.
One key limitation found was that Heroku RAM memory is quite limited at 512MB, and JavaScript is not exactly the most memory economical language out there. Started investigation at: github.com/ourbigbook/ourbigbook/issues/230 Initially working around that by simply splitting the largest files. We were just on the verge of what could be ran however luckily, so a few dozen splits was enough, it managed to handle 70 kB OurBigBook Markup inputs. So hopefully if we manage to optimize a bit more we will be able to set a maximum size of 100 kB and still have a good safety margin.
SQL example Updated +Created
We have some runnable SQL examples with assertion under the sequelize/raw directory.
These examples are written in the Sequelize library using raw queries.
Sequelize is used minimally, just to feed raw queries in transparently to any underlying database, and get minimally parsed results out for us, which we then assert with standard JavaScript. The queries themselves are all written by hand.
By default the examples run on SQLite. Just like the examples from sequelize example, you can set the database at runtime as:
  • ./index.js or ./index.js l: SQLite
  • ./index.js p: PostgreSQL. You must manually create a database called tmp and ensure that peer authentication works for it
Here we list only examples which we believe are standard SQL, and should therefore work across different SQL implementations:
SQL histogram Updated +Created
OK, there's a billion questions:
Let's try it on SQLite 3.40.1, Ubuntu 23.04. Data setup:
sqlite3 tmp.sqlite 'create table t(x integer)'
sqlite3 tmp.sqlite <<EOF
insert into t values (
  0,
  2,
  2,
  3,

  5,
  6,
  6,
  8,
  9,

  17,
)
EOF
sqlite3 tmp.sqlite 'create index tx on t(x)'
For a bin size of 5 ignoring empty ranges we can:
sqlite3 tmp.sqlite <<EOF
select floor(x/5)*5 as x,
       count(*) as cnt
from t
group by 1
order by 1
EOF
which produces the desired:
0|4
5|5
15|1
And to consider empty ranges we can use SQL genenerate_series + as per stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql:
sqlite3 tmp.sqlite <<EOF
select x, sum(cnt) from (
  select floor(x/5)*5 as x,
         count(*) as cnt
    from t
    group by 1
  union
  select *, 0 as cnt from generate_series(0, 15, 5)
)
group by x
EOF
which outputs the desired:
0|4
5|5
10|0
15|1
SQL REPEATABLE READ isolation level Updated +Created
nodejs/sequelize/raw/parallel_create_delete_empty_tag.js is an example which experimentally seems to be solved by REAPEATABLE READ, although we are not sure that this is truly the case and why. What is clear is that that example is not solved by the SQL READ COMMITTED isolation level.
In PostgreSQL, this is the first isolation level which can lead to postgreSQL serialization failures, this does not happen to SQL READ COMMITTED isolation level in that DBMS. You then have to retry the transaction.
SQL standard Updated +Created
A quick look at PostgreSQL's compliance notes: www.postgresql.org/docs/13/features.html shows the complete utter mess that this standard is. Multiple compliance levels that no one fully implements and optional features everywhere.
SQL transaction isolation level Updated +Created
Each transaction isolation level specifies what can or cannot happen when two queries are being run in parallel, i.e.: the memory semantics of the system.
Remember that queries can affects thousands of rows, and database systems like PostgreSQL can run multiple such queries at the same time.
Implementation specifics:
SQL TRIGGER Updated +Created
SQL's implementation of database triggers.
This feature is really cool, as it allows you to keep caches up to date!
In particular, everything that happens in a trigger happens as if it were in a transaction. This way, you can do less explicit transactions when you use triggers. It is a bit like the advantages of SQL CASCADE.
Generating test data for full text search tests Updated +Created
I've been thinking lightly about adding full text search to OurBigBook.
For example, at docs.ourbigbook.com/news/article-and-topic-id-prefix-search article search was added, but it only finds if you search something that appears right at the start of a title, e.g. for:
Fundamental theorem of calculus
you'd get a hit for:
fundamental
but not for
calculus
To do this efficiently, we need full text search, which PostgreSQL implements.
But finding a clean way to generate test data for testing out the speedup was not so easy and exploration into this led me to publishing a few new slightly improved methods where Googlers can now find them:
Finally I did a writeup summarizing PostgreSQL full text search: Section "PostgreSQL full-text search" and also dumped it at: www.reddit.com/r/PostgreSQL/comments/12yld1o/is_it_worth_using_postgres_builtin_fulltext/ for good measure.