Source: /cirosantilli/oxford-nanopore-river-bacteria/overview-of-the-experiment

= Overview of the experiment

For those that know biology and just want to do the thing, see: <protocols used>{full}.

The PuntSeq team uses an <Oxford Nanopore MinION> <DNA sequencing>[DNA sequencer] made by <Oxford Nanopore Technologies> to sequence the <16S ribosomal RNA>[16S] region of bacterial <DNA>, which is about 1500 nucleotides long.

This kind of "decode everything from the sample to see what species are present approach" is called "<metagenomics>".

This is how the MinION looks like: <image Oxford Nanopore MinION top>{full}.

\Image[https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Oxford_Nanopore_MinION_top_cropped.jpg/392px-Oxford_Nanopore_MinION_top_cropped.jpg]
{title=Oxford Nanopore MinION top}

\Image[https://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Oxford_Nanopore_MinION_side_cropped.jpg/191px-Oxford_Nanopore_MinION_side_cropped.jpg]
{title=Oxford Nanopore MinION side}

\Image[https://upload.wikimedia.org/wikipedia/commons/thumb/0/0a/Oxford_Nanopore_MinION_top_open_cropped.jpg/110px-Oxford_Nanopore_MinION_top_open_cropped.jpg]
{height=500}
{title=Oxford Nanopore MinION top open}

\Image[https://upload.wikimedia.org/wikipedia/commons/thumb/0/0f/Oxford_Nanopore_MinION_side_USB_cropped.jpg/597px-Oxford_Nanopore_MinION_side_USB_cropped.jpg]
{title=Oxford Nanopore MinION side USB}

The 16S region codes for one of the <RNA> pieces that makes the https://en.wikipedia.org/w/index.php?title=Ribosome&oldid=912600990\#Bacterial_ribosomes[bacterial ribosome].

Before <sequencing>[sequencing the DNA], we will do a <PCR> with primers that fit just before and just after the 16S DNA, in well conserved regions expected to be present in all bacteria.

The PCR replicates only the DNA region between our two selected primers a gazillion times so that only those regions will actually get picked up by the sequencing step in practice.

<eukarya>[Eukaryotes] also have an analogous ribosome part, the 18S region, but the PCR primers are selected for targets around the 16S region which are only present in prokaryotes.

This way, we amplify only the 16S region of bacteria, excluding other parts of bacterial genome, and excluding eukaryotes entirely.

Despite coding such a fundamental piece of RNA, there is still surprisingly variability in the 16S region across different bacteria, and it is those differences will allow us to identify which bacteria are present in the river.

The variability exists because certain base pairs are not fundamental for the function of the 16S region. This variability happens mostly on https://en.wikipedia.org/wiki/Stem-loop[RNA loops as opposed to stems], i.e. parts of the RNA that don't base pair with other RNA in the https://en.wikipedia.org/wiki/Nucleic_acid_secondary_structure[RNA secondary structure] as shown at: <code RNA stem-loop structure>{full}.

``
                A-U
               /   \
A-U-C-G-A-U-C-G     C
| | | | | | | |     |
U-A-G-C-U-A-G-C     G
               \   /
                U-A
|             ||    |
+-------------++----+
    stem        loop
``
{title=RNA stem-loop structure}

This is how the 16S RNA secondary structure looks like in its full glory: <image 16S RNA secondary structure>{full}.

\Image[https://upload.wikimedia.org/wikipedia/commons/a/a6/16S.svg]
[height=800]
{height=500}
{title=16S RNA secondary structure}

Since loops don't base pair, they are less crucial in the determination of the secondary structure of the RNA.

The variability is such that it is possible to identify individual species apart if full sequences are known with certainty.

With the experimental limitations of experiment however, we would only be able to obtain https://en.wikipedia.org/wiki/Family_(biology)[family] or https://en.wikipedia.org/wiki/Genus[genus] level breakdowns.