Ciro Santilli @cirosantilli 37

 Incoming links: OurBigBook

Trillium Notes  Updated 2025-07-16  +Created 1970-01-01

Originally at github.com/zadam/trilium, then after development stopped the community took it up at: github.com/TriliumNext/Notes.

Tree based organization at last. Infinitely deep.

Amazing WYSIWYG, including maths and tables, plus insane plugins like canvas mode, and specific file formats like code/mermaid diagrams/drawing mode.

Intentionally or not, they've basically made an open source Notion, with the possible exception that Notion historically started on web and moved to the desktop, while Trillium went the other way round.

Version history with automatic snapshots at intervals. TODO how is it implemented? Do they just ZIP multiple versions?

No multiuser features. Except for that, could have been a good starting point of an online multiuser thing such as OurBigBook.com!

With Book Notes it is possible possible to see more than one page at a time on the output, which his a major feature of OurBigBook. But does it show on HTML export as well?

You can static HTML export any subtree by right clicking on it in the navigation tree.

Is there a CLI to export to HTML? github.com/zadam/trilium/issues/3012

HTML export keeps all data as HTML is their native format. This may be inherited from CKEditor. The files are mostly visible, but there is some CSS missing, it is not 100% like editor, notably math is broken. There is also a hosted way of exposing: github.com/zadam/trilium/wiki/Sharing.

trilium.rocks however has a very good export, it is just a question of how much they had to hacked things, source at: github.com/zerebos/trilium.rocks

The default tHTML export uses frame navigation, with a toc fixed on the left frame. Efficient, but not of this century.

There is no concept of user created unique text IDs: you can have the same headers in the same folders in the UI. It's not even a matter of scopes. On exports they are differentiated as 1_name, 2_name, etc.

./Trilium Demo/Books/To read/1_HR.md
./Trilium Demo/Books/To read/2_HR.md
./Trilium Demo/Books/To read/HR.md

Markdown export warns:

this preserves most of the formatting

Architecture: runs on local SQLite database via better-sqlite3. Data apparently stored in SQLite database at ~/.local/share/trilium-data, no raw files.

Markup is stored as HTML as seen from: sqlite3 document.db 'SELECT * from note_contents'. HTML is their native storage format, quite interesting. But this means it is not source centric, so any source editing would have to go via import/export. It can be done apparently: github.com/zadam/trilium/wiki/Markdown but involves shoving a ZIP around.

WYSIWYG based on ckeditor.com/ which is a dependency. It is kind of cool that the view in which you view the output is exactly the same as the one you edit in, and there is no intermediate format, just the HTML.

Math is KaTeX based.

It also runs on the browser via a server: github.com/zadam/trilium/wiki/Server-installation. And they have a paid service for it at: trilium.cc/. Quite impressive.

They have server to from desktop sync: github.com/zadam/trilium/wiki/synchronization. There is no conflict resolution, one of them wins randomly. But they have revision history, and anything lost will be in the revision history. They have so many features it is mind blowing.

Maintainer announced he would be slowing down development since January 2024: github.com/zadam/trilium/issues/4620?ref=selfh.st

 Read the full article

University is broken  Updated 2025-07-16  +Created 1970-01-01

 View more

You just have to spend a few minutes with students until they complain about the courses or teachers. And you just have to spend a few hours with teachers until they complain about the students or broader system.

University is broken, and everyone knows it. The only question now is finding a viable, "political cash flow positive" path, into something better.

Bibliography:

academeblog.org/2022/10/05/american-universities-are-going-to-implode/ American Universities Are Going to Implode (2022) by www.linkedin.com/in/jsgabin/
theimaginativeconservative.org/2020/04/higher-education-implode-alexander-zubatov.html Higher Education Is About to Implode by Alexander Zubatov (2020)
2024-05-28 www.theguardian.com/education/article/2024/may/28/i-see-little-point-uk-university-students-on-why-attendance-has-plummeted ‘I see little point’: UK university students on why attendance has plummeted
"There’s a bit of a feeling that there is just box-ticking going on [among students], and getting a degree at the end of it."
Give it to me, baby. OurBigBook incoming.

 Read the full article

Updates / Generating test data for full text search tests  Updated 2025-07-16  +Created 2024-12-23

 View more

I've been thinking lightly about adding full text search to OurBigBook.

For example, at docs.ourbigbook.com/news/article-and-topic-id-prefix-search article search was added, but it only finds if you search something that appears right at the start of a title, e.g. for:

Fundamental theorem of calculus

you'd get a hit for:

fundamental

but not for

calculus

To do this efficiently, we need full text search, which PostgreSQL implements.

But finding a clean way to generate test data for testing out the speedup was not so easy and exploration into this led me to publishing a few new slightly improved methods where Googlers can now find them:

unix.stackexchange.com/questions/97160/is-there-something-like-a-lorem-ipsum-generator/787733#787733 I propose a neat random "sentence" generator using common CLI tools like grep and sed and the pre-installed Ubuntu dictionary /usr/share/dict/american-english:
```
grep -v "'" /usr/share/dict/american-english |
shuf -r |
paste -d ' ' $(printf "%4s" | sed 's/ /- /g') |
sed -e 's/^$.$/\U\1/;s/$/./' |
head -n10000000 \
> lorem.txt
```
- to achieve that, I also proposed two superior "join every N lines" method for the CLI: stackoverflow.com/questions/25973140/joining-every-group-of-n-lines-into-one-with-bash/79257780#79257780, notably this awk poem:
  seq 10 | awk '{ printf("%s%s", NR == 1 ? "" : NR % 3 == 1 ? "\n" : " ", $0 ) } END { printf("\n") }'

stackoverflow.com/questions/3371503/sql-populate-table-with-random-data/79255281#79255281 I propose:

a clean PostgreSQL random string stored procedure that picks random characters from an allowed character list

CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$
select
string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word
from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-    ')) as symbols(characters)
join generate_series(1, $1) on 1 = 1
$$ language sql;

first generating PostgreSQL data as CSV, and then importing the CSV into PostgreSQL as a more flexible method. This can also be done in a streaming fashion from stdin which is neat.
```
python generate_data.py 10 | psql mydb -c '\copy "mytable" FROM STDIN'
```

stackoverflow.com/questions/16020164/psqlexception-error-syntax-error-in-tsquery/79437030#79437030 regarding the safe generation of prefix search tsquery from user inputs without query errors, I've learned about websearch_to_tsquery and further highlighted a possible tsquery -> text -> tsquery approach that might be correct for prefix searches
stackoverflow.com/questions/67438575/fulltext-search-using-sequelize-postgres/79439253#79439253 I put everything together into a minimal Sequelize example, read for usage in OurBigBook

Finally I did a writeup summarizing PostgreSQL full text search: Section "PostgreSQL full-text search" and also dumped it at: www.reddit.com/r/PostgreSQL/comments/12yld1o/is_it_worth_using_postgres_builtin_fulltext/ for good measure.

 Read the full article

Updates / How the tech improved  Updated 2025-07-16  +Created 2025-03-08

 View more

In any case, the outcome of that is that the tech has improved. And I have done a relatively good job of clearly publishing any "more user visible" improvements to docs.ourbigbook.com/news and social media such as

though it is important to note that there have been more than one "fix a hard bug" weeks that were not published because they would just bore readers.

During this period the main focus has been on improving OurBigBook Web, i.e. the dynamic website that powers OurBigBook.com. There are two reasons for that:

Web is what has the OurBigBook topics feature for mind-melding, which is the killer feature of OurBigBook compared to other note taking apps and therefore deserves the highest levels of priority
Static website generation is an indispensable escape valve that ensures that your content can be published forever even if OurBigBook.com goes down one day, which it won't as long as I live. But the innovation is Web.
static website generation was closer to good enough, but web was much further and is fundamentally harder.
I'm extremely satisfied with OurBigBook static website generation and haven't touched it as much. It wasn't easy to reach this state, but I'm there.
But Web is a different and much more complex beast.
Making CLI software that will run on a person's local computer under full trust and building a bunch of HTML from lightweight markup in bulk is one thing.
But making a public dynamic website that has to continuously maintain a coherent database state on granular updates, while giving users some trust but not enough for them to blow everything up is on a totally different level. See e.g. the recent SPAM attack we've had to fend off.
Figure 1.
Screenshot showing voting manipulated SPAM as the most highly upvoted article on OurBigBook.com
. Source.
And then there's also the issue of front-end being mega-hard to get right.

As a result, Web is now way less buggy and much more usable.

If you look through the list of Web updates, there is nothing specifically mind blowing. The core ideas have largely crystallized, and we are just trying to making them click. I have a few more punches up my sleeve, but the core is decided.

Figure 2.
OurBigBook Web search
. Source. This is one of the many basic quality of life improvements that have been done on OurBigBook Web.

Figure 3.
OurBigBook Web article announcement
. Source. Another cute new feature, you can send an email to your followers about a new amazing article you created.

Web process has been somewhat slower than what I'd like. Of course, it is the case of any project that things are easily said than done. But there are two other main structural factors that have played into it:

I have my first baby now, and we're learning how to deal with that on the fly.
For example, we could have put him on childcare a bit earlier, but due to inexperience we've kept him a bit longer than we maybe should have.
Things are well sorted out now, but not matter how good your support system is, at the end of the day, and more often night, it is you the parents that have to deal with a lot of inevitable baby issues. Unless you want them to turn into psychopaths and drug addicts that is, which I don't. I've reached the point of semi failure middle age that the baby feels like my best moonshot.
All of this sets a fundamental limit on how many hours you can work per week.
But at least with the donations I was able to work on OurBigBook at all. Because if it weren't for that, I would have to focus entirely on the generic job instead and OurBigBook would have been put on hold.
the choice of Web stack. I was allured by Next.js. I can see the beauty and usefulness of a Node.js render front-end that also runs on backend and hydration. That is awesome.
But:
- React is insanely hard to learn and understand. Furthermore, it is also hard to understand the performance problem that it solves, and actually have a benchmark where this problem is solved faster than just delivering some HTML files with ad-hoc Js on top.
- the lack (or perhaps excess of shitty) actual web framework like Ruby on Rails and Django means that I have to rediscover the wheel many times over for all the essential support activities like testing, login and so one
At this point a rewrite is out of the question. I've managed to master things well enough to get a decent result, and given up on the few things that I couldn't for the life of me achieve, after documenting them very well for posterity of course.

Aside from Web, there was only one thing that received a significant improvement, and that was the OurBigBook VS Code extension. The extension is not perfect, and it is not the "final UI", which has to be some WYSIWYG implementation, and there are some fundamental limitations that cannot be overcome without patching VS Code itself. However, the extension is already extremely usable, and I'm writing this on it right now. Basics like syntax highlighting, jump to definition and autocomplete are very useful and usable.

 Read the full article

Updates / Metrics and rationales  Updated 2025-07-16  +Created 2025-03-08

 View more

Long story short, the project is so far a complete failure on the most important metric: number of regular users, which current sits at exactly one: myself.

There were notable users who found the project online and who actually tried to use the website for some content and provided extremely valuable feedback:

Unfortunately after the period of a few weeks they stopped using it to follow their other priorities instead. Which is of course totally fine, however sad.

I still believe that the OurBigBook Web feature is a significant tech innovation that could make the website go big.

I also believe that the project gets many fundamentals of braindumping right, notably the infinitely deep table of contents without forced scoping, e.g.:

- Mathematics
  - Calculus

does not make Calculus have an ID orr URL of mathematics/calculus, rather it's just calculus.

Figure 1.
Internal cross file internal link uses only the leaf ID `hilbert-space`.

But there is a fundamental difficulty in reaching critical mass to that self-sustaining point, as people don't seem to be convinced by these logical "my system is better" argument alone, as opposed to having them Google into stuff they need now and then understand that the project is awesome.

A closely related critical mass issue is that existing big multiuser knowledge base websites such as Stack Overflow and Wikipedia have a tremendous advantage on PageRank. No matter how useless a Wikipedia article about something is, it will always be on top of Google within a week of creation for title hits. And since the main goal of publishing your stuff is to get it seen, it makes much more sense for writers to publish on such existing websites whenever possible, because anywhere else it is way way less likely to be seen by anybody.

Even I end up writing way more on Stack Overflow than on OurBigBook as a programmer. But I still believe that there is a value to OurBigBook, for the usual reasons of:

it allows you to organize a more global view of a subject, i.e. a book. Even I write answers on Stack Overflow, I also tend to organize links to these answers in a structured ways here, see e.g. big topics such as SQL
deletionism and overly narrowness of allowed topics/style

Perhaps what saddens me the most is that even on GitHub stars/Twitter/Hacker news terms there is almost no interest in the project despite the fact that I consider that it has innovations, while many other note taking apps as well in the thousands of stars. Maybe I'm just delusional and all the tech that I'm doing is completely useless?

Part of the issue is probably linked to the fact that most other note taking apps focus on "help me organize my ideas so I can make more money" and often completely ignore "I want to publish my knowledge", and stuff that helps you make money is always easier to sell and promote.

OurBigBook on the other hand a huge focus on "I want to publish me knowledge". It aims almost single mindedly in being the best tool ever for that. However this doesn't make money for people, and therefore there are going to be way less potential users.

I do believe strongly that all it takes is a few users for the project to snowball. For some people, once you start braindumping, it is very addictive, and you never want to stop basically. So with only a few of those we can open large parts of undergrad knowledge to the world. But these people are few, and so far I haven't been able to find even a single one like me, and on top of that convince them that I have created the ultimate system for their knowledge publishing desires.

Another general lesson is that I should perhaps aimed for greater compatibility with existing systems such as Obsidian. Taking something that many people already know and use can have a huge impact on acceptance. E.g. anything that touches Obsidian can reach thousands of stars: github.com/KosmosisDire/obsidian-webpage-export. Note taking apps that aim for "markdown" compatibility also tend to fare better, even if in the end you inevitably have to extend the Markdown for some of your features. And WYSIWYG, which I want but don't have, is perhaps the ultimate familiarity.

Another issue compared to other platforms is that OurBigBook just came out late. Obsidian launched in 2020. Roam Research and Trillium Notes also came earlier. And it is hard to fight the advantage already gained by those on the "I'm going to take some personal notes" area. I do believe however that there a strong separation between "these are my personal notes" and "I want to publish these". Once you decide to publish your knowledge, you immediately start to write in a different way, and it is very hard to convert pre-existing "private" notes into ones suitable for public consumption.

 Read the full article

Updates / Post OurBigBook job search round 2025  Updated 2025-07-16  +Created 2025-05-07  2025-07-16

 View more

I shouldn't be doing this on funded OurBigBook time which is until the end of May, but I was getting too nervous and decided to start a casual job search to test the waters.

In particular I want to see if I can get past the HR lady step without toning down my online profiles. If nothing works out for the next round I'll be hiding anything too spicy like:

prominently seeking funding for OurBigBook on my LinkedIn profile
CIA 2010 covert communication websites references. This will be my first job hunt since I have published that article. Wish me luck.
gay Putin profile picture on Stack Overflow

Another interesting point is to see if French companies are more likely to reply given that Ciro Santilli studied at École Polytechnique which the French worship.

Figure 1.

Gay Putin, currently used in Ciro Santilli's Stack Overflow profile

. Ciro's profiles may be a bit too much for the HR ladies who reject his job applications on the spot. To be fair, perhaps not enough years of experience for certain applications and job hopping may have something to do with it too. But since they don't ever tell you anything not to get sued, we'll never know.

I'm looking in particular either for:

machine learning-adjacent jobs in companies that seem to be doing something that could further AGI, e.g. automatic code generation or robotics would be ideal
quantum computing
systems programming, which is what I actually have work experience with

I spent the last two weeks doing that:

one week browsing everything of interest in London and Paris and sending applications to anything that seemed both relevant and interesting. Maintaining an application list at: Section "Job application by Ciro Santilli".
one week on a very laborious but somewhat interesting take home exercise for Linux kernel engineer a Canonical, makers of Ubuntu.
I had a week to finish 5 practical coding and packaging questions, and I tried to do everything as perfectly as possible, but I somewhat underestimated the amount of work and wait needed to do everything and didn't manage to finish question 4 and missed 5. Oops let's see how that goes.
At least this had a few good outcomes for the Internet as I tried to document things as nicely as I could where they were missing from Google as usual:
- I re-tested Linux Kernel Module Cheat and made some small improvements. Things still worked from a Ubuntu 24.10 host (using Docker to Ubuntu 22.04), and I also checked that kernel 6.8 builds and GDB step debugs after adding the newly required config CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT, also mentioned that at: Why are there no debug symbols in my vmlinux when using gdb with /proc/kcore?
- I contributed some simple updates to github.com/martinezjavier/ldd3 getting it closer to work on Linux kernel v6.8. That repository aims to keep the venerable examples from Linux kernel module book LDD3 alive on newer kernels, and is a very good source for kernel module developers.
- How to compile a Linux kernel module?: wrote a quick Ciro-approved tutorial
- Dynamic array in Linux kernel module: I gave an educational example of a dynamic byte array (like std::string) using the kvmalloc family of allocators
- quickemu: this is a good emulator manager and I think I'll be using it for Ubuntu images when needed from now on. I wrote:
  How to run Ubuntu desktop on QEMU?: an introductory tutorial to the software as their README is not that good as is often the case. It's hard for project authors to predict what new users want or not. This is my second answer to this question, the previous one focusing on a more manual approach without third party helpers.
  How to share folder between guest/host? (Quickemu): I explained how to setup a 9p mount to share a directory between guest and host
- Error :: You must put some 'source' URIs in your sources.list: updated this answer for Ubuntu 24.04. This issue comes up when you want to do either of:
  sudo apt build-dep sudo apt source
  which don't work by default, and my answer explains how to do it from the GUI and CLI. The CLI method is specially important for Docker images. Since Ubuntu doesn't offer a stable CLI method for this, the method breaks from time to time and we have to find the new config file to edit.
- What is hardware enablement (HWE)?: I learned a bit better how Ubuntu structures its kernel releases for each Ubuntu release
  Figure 2.
  Linux kernel version used for each Ubuntu release
  . Source. In particular, Ubuntu has HWE kernels which are updated kernels for older releases. E.g.
  24.04.0 and 24.04.1 had kernel 6.8
  24.04.2 moved up to kernel 6.11, the same one used in 24.10, to support newer hardware
Some of the main issues I had were:
- compiling Linux kernel for Ubuntu is extremely slow. I was used to compiling for embedded system with Buildroot, which finishes in minutes, but for Ubuntu is hours, presumably because they enable as many drivers as possible to make a single ISO work on as many different computers as possible, which makes sense, but also makes development harder
- my QEMU setup for Ubuntu was not quite as streamlined and I relearned a few things and set up quickemu. By chance I had recently come across quickemu for testing OurBigBook on MacOS, but I had to learn a bit how to set it up reasonably too

I'll make sure to add two weeks of OurBigBook work after May to make up for this.

 Read the full article

Updates / Quick fun with the Common Crawl web graph  Updated 2025-07-16  +Created 2025-02-26

 View more

github.com/cirosantilli/cirosantilli.github.io/issues/198. Previously at: stackoverflow.com/questions/31321009/best-more-standard-graph-representation-file-format-graphson-gexf-graphml/79467334#79467334 but Stack Overflow fucking deleted the question.

I wanted to do a quick exploration of open PageRank implementation and data.

My general motivation for this is that a PageRank-like algorithm could be useful for more accurate user and article ranking on OurBigBook, see: Section "PageRank-like ranking"

But it could also be just generally cool to apply it to other graph datasets, e.g. for computing an Wikipedia internal PageRank.

A quick Google reveals only Open PageRank, but their methods are apparently closed source.

Then I had a look at the Common Crawl web graph data to see if I could easily calculate it myself, and... they already have it! See: Section "Common Crawl web graph official PageRank"

Their graph dumps are in BVGraph graph file format, which is the native format of the WebGraph framework, which implements the format and algorithms such as PageRank.

The only thing I miss is a command line interface to calculate the PageRank. That would be so awesome.

The more I look at it the more I love Common Crawl.

Announcements:

In cc-main-2024-25-dec-jan-feb-domain-ranks.txt:

cirosantilli.com was ranked ~453k
ourbigbook.com was at ~606k

 Read the full article