Oracle Database Updated 2025-07-16
Often known simply as SQL Server, a terrible thing that makes it impossible to find portable SQL answers on Google! You just have to Google by specific SQL implementation unfortunately to find anything about the open source ones.
Optical microscope Updated 2025-07-16
Definition not very nice, as it excludes X-ray crystallography, which is also photon based.
Operon vs transcription unit Updated 2025-07-16
That single operon can produce two different mRNA transcription units:
The reason for this appears to be that there is a rho-independent termination region after thrL. But then under certain conditions, that must get innactivated, and then the thrLABC is produced instead.
Open X-Embodiment Updated 2025-07-16
GitHub describes the input quite well:
The model takes as input a RGB image from the robot workspace camera and a task string describing the task that the robot is supposed to perform.
What task the model should perform is communicated to the model purely through the task string. The image communicates to the model the current state of the world, i.e. assuming the model runs at three hertz, every 333 milliseconds, we feed the latest RGB image from a robot workspace camera into the model to obtain the next action to take.
TODO: how is the scenario specified?
TODO: any simulation integration to it?
https://web.archive.org/web/20250209172539if_/https://raw.githubusercontent.com/google-deepmind/open_x_embodiment/main/imgs/teaser.png
For those that know biology and just want to do the thing, see: Section "Protocols used".
The PuntSeq team uses an Oxford Nanopore MinION DNA sequencer made by Oxford Nanopore Technologies to sequence the 16S region of bacterial DNA, which is about 1500 nucleotides long.
This kind of "decode everything from the sample to see what species are present approach" is called "metagenomics".
This is how the MinION looks like: Figure 1. "Oxford Nanopore MinION top".
The 16S region codes for one of the RNA pieces that makes the bacterial ribosome.
Before sequencing the DNA, we will do a PCR with primers that fit just before and just after the 16S DNA, in well conserved regions expected to be present in all bacteria.
The PCR replicates only the DNA region between our two selected primers a gazillion times so that only those regions will actually get picked up by the sequencing step in practice.
Eukaryotes also have an analogous ribosome part, the 18S region, but the PCR primers are selected for targets around the 16S region which are only present in prokaryotes.
This way, we amplify only the 16S region of bacteria, excluding other parts of bacterial genome, and excluding eukaryotes entirely.
Despite coding such a fundamental piece of RNA, there is still surprisingly variability in the 16S region across different bacteria, and it is those differences will allow us to identify which bacteria are present in the river.
The variability exists because certain base pairs are not fundamental for the function of the 16S region. This variability happens mostly on RNA loops as opposed to stems, i.e. parts of the RNA that don't base pair with other RNA in the RNA secondary structure as shown at: Code 1. "RNA stem-loop structure".
                A-U
               /   \
A-U-C-G-A-U-C-G     C
| | | | | | | |     |
U-A-G-C-U-A-G-C     G
               \   /
                U-A
|             ||    |
+-------------++----+
    stem        loop
Code 1.
RNA stem-loop structure
.
This is how the 16S RNA secondary structure looks like in its full glory: Figure 5. "16S RNA secondary structure".
Since loops don't base pair, they are less crucial in the determination of the secondary structure of the RNA.
The variability is such that it is possible to identify individual species apart if full sequences are known with certainty.
With the experimental limitations of experiment however, we would only be able to obtain family or genus level breakdowns.
Open source software Updated 2025-07-16
What happens when the underdogs get together and try to factor out their efforts to beat some evil dominant power, sometimes victoriously.
Or when startups use the cheapest stuff available and randomly become the next big thing, and decide to keep maintaining the open stuff to get features for free from other companies, or because they are forced by the Holy GPL.
Open source frees employees. When you change jobs, a large part of the specific knowledge you acquired about closed source a project with your blood and tears goes to the trash. When companies get bought, projects get shut down, and closed source code goes to the trash. What sane non desperate person would sell their life energy into such closed source projects that could die at any moment? Working on open source is the single most important non money perk a company can have to attract the best employees.
Open source is worth more than the mere pragmatic financial value of not having to pay for software or the ability to freely add new features.
Its greatest value is perhaps the fact that it allows people study it, to appreciate the beauty of the code, and feel empowered by being able to add the features that they want.
That is why Ciro Santilli thought:
Life is too short for closed source.
But quoting Ciro's colleague S.:
Every software is open source when you read assembly code.
While software is the most developed open source technology available in the 2010's, due to the "zero cost" of copying it over the Internet, Ciro also believes that the world would benefit enormously from open source knowledge in all areas on science and engineering, for the same reasons as open source.
Open reading frame Updated 2025-07-16
Area between a start codon and an stop codon.
This term is useful because:
Open knowledge Updated 2025-07-16
Ciro Santilli's raison d'etre, one of his attempts: OurBigBook.com.
The outcome of closed knowledge is reverse engineering.
Open Images dataset Updated 2025-07-16
As of v7:
The images and annotations are both under CC BY, with Google as the copyright holder.
ONNX Updated 2025-07-16
The most important thing this project provides appears to be the .onnx file format, which represents ANN models, pre-trained or not.
Deep learning frameworks can then output such .onnx files for interchangeability and serialization.
Some examples:
The cool thing is that ONNX can then run inference in an uniform manner on a variety of devices without installing the deep learning framework used for. It's a bit like having a kind of portable executable. Neat.
One page to rule them all Updated 2025-07-16
It is true that one image is worth a thousand words, but unfortunately it is also true that one image takes up at least as much bytes as a thousand words!
Having one single page to rule them all is of course the ideal setup for a website, as you can Ctrl + F one ToC and quickly find what you want.
And, with Linux Kernel Module Cheat Ciro noticed that it is very hard to write so much intelligent prose that becomes larger than reasonable to load on a single webpage.
He then started using this technique for everything he writes, including this page and Chinese government.
However, if there are too many images on the page, the loading of the last images would take forever in case users want to view the last sections.
There are two solutions to that:
Ciro is still deciding between those two. The traditional approach works for sure but loses the one page to rule them all benefits.
The innovative approach will work for interactive viewing, but archive.org will fail to load the images for example, and there may be other unforseen consequences.
Wikimedia Commons is awesome and automatically converts and serves smaller versions of images, so always choose the smallest images size needed by the output document. Readers can then find the higher resolution versions by following the page source.
This also comes to mind: motherfuckingwebsite.com
zettelkasten.de/posts/overview/ from zettelkasten:
How many Zettelkästen should I have? The answer is, most likely, only one for the duration of your life. But there are exceptions to this rule.
OpenStax Updated 2025-07-16
These people have good intentions.
The problem is that they don't manage to go critical because there's to way for students to create content, everything is manually curated.
You can't even publicly comment on the textbooks. Or at least Ciro Santilli hasn't found a way to do so. There is just a "submit suggestion" box.
This massive lost opportunity is even shown graphically at: cnx.org/about (archive) where there is a clear separation between:
  • "authors", who can create content
  • "students", who can consume content
Maybe this wasn't the case in their legacy website, legacy.cnx.org/content?legacy=true, but not sure, and they are retiring that now.
Thus, OurBigBook.com. License: CC BY! So we could re-use their stuff!
TODO what are the books written in?
Video 1.
Richard Baraniuk on open-source learning by TED (2006)
Source.
Ollama Updated 2025-07-16
Ollama is a highly automated open source wrapper that makes it very easy to run multiple Open weight LLM models either on CPU or GPU.
Its README alone is of great value, serving as a fantastic list of the most popular Open weight LLM models in existence.
Install with:
curl https://ollama.ai/install.sh | sh
The below was tested on Ollama 0.1.14 from December 2013.
Download llama2 7B and open a prompt:
ollama run llama2
On P14s it runs on CPU and generates a few tokens per second, which is quite usable for a quick interactive play.
As mentioned at github.com/jmorganca/ollama/blob/0174665d0e7dcdd8c60390ab2dd07155ef84eb3f/docs/faq.md the downloads to under /usr/share/ollama/.ollama/models/ and ncdu tells me:
--- /usr/share/ollama ----------------------------------
    3.6 GiB [###########################] /.ollama
    4.0 KiB [                           ]  .bashrc
    4.0 KiB [                           ]  .profile
    4.0 KiB [                           ]  .bash_logout
The file:
/usr/share/ollama/.ollama/models/manifests/hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF/Q2_K
gives a the exact model name and parameters.
We can also do it non-interactively with:
/bin/time ollama run llama2 'What is quantum field theory?'
which gave me:
0.13user 0.17system 2:06.32elapsed 0%CPU (0avgtext+0avgdata 17280maxresident)k
0inputs+0outputs (0major+2203minor)pagefaults 0swaps
but note that there is a random seed that affects each run by default. ollama-expect is an attempt to make the output deterministic.
Some other quick benchmarks from Amazon EC2 GPU on a g4nd.xlarge instance which had an Nvidia Tesla T4:
0.07user 0.05system 0:16.91elapsed 0%CPU (0avgtext+0avgdata 16896maxresident)k
0inputs+0outputs (0major+1960minor)pagefaults 0swaps
and on Nvidia A10G in an g5.xlarge instance:
0.03user 0.05system 0:09.59elapsed 0%CPU (0avgtext+0avgdata 17312maxresident)k
8inputs+0outputs (1major+1934minor)pagefaults 0swaps
So it's not too bad, a small article in 10s.
It tends to babble quite a lot by default, but eventually decides to stop.

Unlisted articles are being shown, click here to show only listed articles.