Why Oxford Nanopore was used instead of Illumina for the sequencing by Ciro Santilli 35 Updated +Created
At the time of the experiment, Illumina equipment was cheaper per base pair and dominates the human genome sequencing market, but it required a much higher initial investment for the equipment (TODO how much).
The reusable Nanopore device costs just about 500 dollars, and about 500 dollars (50 unit volume) for the single usage flow cell which can decode up to 30 billion base pairs, which is about 10 human genomes 1x! Note that 1x is basically useless for one of the most important of all applications of sequencing: detection of single-nucleotide polymorphisms, since the error rate would be too high to base clinical decisions on.
Compare that to Illumina which is currently doing about an 1000 dollar human genome at 30x, and a bit less errors per base pair (TODO how much).
Other advantages of the MinION over Illumina which didn't really matter to this particular experiment are:
Mineral by Ciro Santilli 35 Updated +Created
Klystron by Ciro Santilli 35 Updated +Created
Europe by Ciro Santilli 35 Updated +Created
For the most part, a great pseudo-country to live in with lots of cultural diversity, art and safety.
However, Europe is in economic decline after all its Jewish and German geniuses fled in/after World War II and due to having more than one natural language is bad for the world.
Hund's first rule by Ciro Santilli 35 Updated +Created
Higher spin multiplicity means lower energy. I.e.: you want to keep all spins pointin in the same direction.
Brewster's angle by Ciro Santilli 35 Updated +Created
Fresnel equations by Ciro Santilli 35 Updated +Created
Overview of the experiment by Ciro Santilli 35 Updated +Created
For those that know biology and just want to do the thing, see: Section "Protocols used".
The PuntSeq team uses an Oxford Nanopore MinION DNA sequencer made by Oxford Nanopore Technologies to sequence the 16S region of bacterial DNA, which is about 1500 nucleotides long.
This kind of "decode everything from the sample to see what species are present approach" is called "metagenomics".
This is how the MinION looks like: Figure 1. "Oxford Nanopore MinION top".
Figure 1.
Oxford Nanopore MinION top
. Source.
Figure 2.
Oxford Nanopore MinION side
. Source.
Figure 3.
Oxford Nanopore MinION top open
. Source.
Figure 4.
Oxford Nanopore MinION side USB
. Source.
The 16S region codes for one of the RNA pieces that makes the bacterial ribosome.
Before sequencing the DNA, we will do a PCR with primers that fit just before and just after the 16S DNA, in well conserved regions expected to be present in all bacteria.
The PCR replicates only the DNA region between our two selected primers a gazillion times so that only those regions will actually get picked up by the sequencing step in practice.
Eukaryotes also have an analogous ribosome part, the 18S region, but the PCR primers are selected for targets around the 16S region which are only present in prokaryotes.
This way, we amplify only the 16S region of bacteria, excluding other parts of bacterial genome, and excluding eukaryotes entirely.
Despite coding such a fundamental piece of RNA, there is still surprisingly variability in the 16S region across different bacteria, and it is those differences will allow us to identify which bacteria are present in the river.
The variability exists because certain base pairs are not fundamental for the function of the 16S region. This variability happens mostly on RNA loops as opposed to stems, i.e. parts of the RNA that don't base pair with other RNA in the RNA secondary structure as shown at: Code 1. "RNA stem-loop structure".
                A-U
               /   \
A-U-C-G-A-U-C-G     C
| | | | | | | |     |
U-A-G-C-U-A-G-C     G
               \   /
                U-A
|             ||    |
+-------------++----+
    stem        loop
Code 1.
RNA stem-loop structure
.
This is how the 16S RNA secondary structure looks like in its full glory: Figure 5. "16S RNA secondary structure".
height=800
Figure 5.
16S RNA secondary structure
. Source.
Since loops don't base pair, they are less crucial in the determination of the secondary structure of the RNA.
The variability is such that it is possible to identify individual species apart if full sequences are known with certainty.
With the experimental limitations of experiment however, we would only be able to obtain family or genus level breakdowns.
Bioinformatics by Ciro Santilli 35 Updated +Created
Because Ciro's a software engineer, and he's done enough staring in computers for a lifetime already, and he believes in the power of Git, he didn't pay much attention to this part ;-)
According to the eLife paper, the code appears to have been uploaded to: github.com/d-j-k/puntseq. TODO at least mention the key algorithms used more precisely.
Ciro can however see that it does present interesting problems!
Because it was necessary to wait for 2 days to get our data, the workshop first reused sample data from previous collections done earlier in the year to illustrate the software.
First there is some signal processing/machine learning required to do the base calling, which is not trivial in the Oxford Nanopore, since neighbouring bases can affect the signal of each other. This is mostly handled by Oxford Nanopore itself, or by hardcore programmers in the field however.
After the base calling was done, the data was analyzed using computer programs that match the sequenced 16S sequences to a database of known sequenced species.
This is of course not just a simple direct string matching problem, since like any in experiment, the DNA reads have some errors, so the program has to find the best match even though it is not exact.
The PuntSeq team would later upload the data to well known open databases so that it will be preserved forever! When ready, a link to the data would be uploaded to: www.puntseq.co.uk/data
sqlite3 Node.js package by Ciro Santilli 35 Updated +Created
Includes its own copy of sqlite3, you don't use the system one, which is good to ensure compatibility. The version is shown at: github.com/mapbox/node-sqlite3/blob/918052b538b0effe6c4a44c74a16b2749c08a0d2/deps/common-sqlite.gypi#L3 SQLite source is tracked compressed in-tree: github.com/mapbox/node-sqlite3/blob/918052b538b0effe6c4a44c74a16b2749c08a0d2/deps/sqlite-autoconf-3360000.tar.gz horrendous. This explains why it takes forever to clone that repository. People who don't believe in git submodules, there's even an official Git mirror at: github.com/sqlite/sqlite
It appears to spawn its own threads via its C extension (since JavaScript is single threaded and and SQLite is not server-based), which allows for parallel queries using multiple threads: github.com/mapbox/node-sqlite3/blob/v5.0.2/src/threading.h
Hello world example: nodejs/node-sqlite3/index.js.
As of 2021, this had slumped back a bit, as maintainers got tired. Unmerged pull requests started piling more, and better-sqlite3 Node.js package started pulling ahead a little.
Ultra High Frequency by Ciro Santilli 35 Updated +Created
Exception to the Madelung energy ordering rule by Ciro Santilli 35 Updated +Created
The most notable exception is the borrowing of 3d-orbital electrons to 4s as in chromium, leading to a 3d5 4s1 configuration instead of the 3d4 4s2 we would have with the rule. TODO how is that observed observed experimentally?
Faraday effect by Ciro Santilli 35 Updated +Created

There are unlisted articles, also show them or only show them.