Source: cirosantilli/oxford-nanopore-river-bacteria/bioinformatics

= Bioinformatics

Because Ciro's a software engineer, and he's done enough staring in computers for a lifetime already, and he believes in the power of <Git>, he didn't pay much attention to this part ;-)

According to the eLife paper, the code appears to have been uploaded to: https://github.com/d-j-k/puntseq[]. TODO at least mention the key algorithms used more precisely.

Ciro can however see that it does present interesting problems!

Because it was necessary to wait for 2 days to get our data, the workshop first reused sample data from previous collections done earlier in the year to illustrate the software.

First there is some signal processing/machine learning required to do the <base calling>, which is not trivial in the Oxford Nanopore, since neighbouring bases can affect the signal of each other. This is mostly handled by Oxford Nanopore itself, or by hardcore programmers in the field however.

After the base calling was done, the data was analyzed using computer programs that match the sequenced 16S sequences to a database of known sequenced species.

This is of course not just a simple direct string matching problem, since like any in experiment, the DNA reads have some errors, so the program has to find the best match even though it is not exact.

The PuntSeq team would later upload the data to well known open databases so that it will be preserved forever! When ready, a link to the data would be uploaded to: https://www.puntseq.co.uk/data