For those that know biology and just want to do the thing, see: Section "Protocols used".
The PuntSeq team uses an Oxford Nanopore MinION DNA sequencer made by Oxford Nanopore Technologies to sequence the 16S region of bacterial DNA, which is about 1500 nucleotides long.
This kind of "decode everything from the sample to see what species are present approach" is called "metagenomics".
This is how the MinION looks like: Figure 1. "Oxford Nanopore MinION top".
Before sequencing the DNA, we will do a PCR with primers that fit just before and just after the 16S DNA, in well conserved regions expected to be present in all bacteria.
The PCR replicates only the DNA region between our two selected primers a gazillion times so that only those regions will actually get picked up by the sequencing step in practice.
Eukaryotes also have an analogous ribosome part, the 18S region, but the PCR primers are selected for targets around the 16S region which are only present in prokaryotes.
This way, we amplify only the 16S region of bacteria, excluding other parts of bacterial genome, and excluding eukaryotes entirely.
Despite coding such a fundamental piece of RNA, there is still surprisingly variability in the 16S region across different bacteria, and it is those differences will allow us to identify which bacteria are present in the river.
The variability exists because certain base pairs are not fundamental for the function of the 16S region. This variability happens mostly on RNA loops as opposed to stems, i.e. parts of the RNA that don't base pair with other RNA in the RNA secondary structure as shown at: Code 1. "RNA stem-loop structure".
| | | | | | | | |
| || |
This is how the 16S RNA secondary structure looks like in its full glory: Figure 5. "16S RNA secondary structure".
Since loops don't base pair, they are less crucial in the determination of the secondary structure of the RNA.
The variability is such that it is possible to identify individual species apart if full sequences are known with certainty.
With the experimental limitations of experiment however, we would only be able to obtain family or genus level breakdowns.
At the time of the experiment, Illumina equipment was cheaper per base pair and dominates the human genome sequencing market, but it required a much higher initial investment for the equipment (TODO how much).
The reusable Nanopore device costs just about 500 dollars, and about 500 dollars (50 unit volume) for the single usage flow cell which can decode up to 30 billion base pairs, which is about 10 human genomes 1x! Note that 1x is basically useless for one of the most important of all applications of sequencing: detection of single-nucleotide polymorphisms, since the error rate would be too high to base clinical decisions on.
Compare that to Illumina which is currently doing about an 1000 dollar human genome at 30x, and a bit less errors per base pair (TODO how much).
Other advantages of the MinION over Illumina which didn't really matter to this particular experiment are:
- portability for e.g. to do analysis on the field near infections outbreaks. Compare that to the smallest Illumina sequencer currently available in 2019, the iSeq 100: Figure 1. "Illumina iSeq 100 DNA sequencer".
- long reads which can be necessary for long repetitive regions, see also: Section "Sequence alignment"