Database of promoter.
E.g. for E. Coli K-12 MG1655: biocyc.org/group?id=:ALL-PROMOTERS&orgid=ECOLI For some context see e. Coli K-12 MG1655 gene thrL + e. Coli K-12 MG1655 gene thrA + thrB + thrC all of which are in the same transcription unit.
NCBI taxonomy entry: www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=511145 This links to:
- genome: www.ncbi.nlm.nih.gov/genome/?term=txid511145 From there there are links to either:
- Download the FASTA: "Download sequences in FASTA format for genome, protein"For the genome, you get a compressed FASTA file with extension
.fna
calledGCF_000005845.2_ASM584v2_genomic.fna
that starts with:>NC_000913.3 Escherichia coli str. K-12 substr. MG1655, complete genome AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTG
- Interactively browse the sequence on the browser viewer: "Reference genome: Escherichia coli str. K-12 substr. MG1655" which eventually leads to: www.ncbi.nlm.nih.gov/nuccore/556503834?report=graphIf we zoom into the start, we hover over the very first gene/protein: the famous (just kidding) e. Coli K-12 MG1655 gene thrL, at position 190-255.The second one is the much more interesting e. Coli K-12 MG1655 gene thrA.
- Gene list, with a total of 4,629 as of 2021: www.ncbi.nlm.nih.gov/gene/?term=txid511145
Contains the genes: e. Coli K-12 MG1655 gene thrL, e. Coli K-12 MG1655 gene thrA, e. Coli K-12 MG1655 gene thrB and e. Coli K-12 MG1655 gene thrC, all of which have directly linked functionality.
We can find it by searching for the species in the BioCyc promoter database. This leads to: biocyc.org/group?id=:ALL-PROMOTERS&orgid=ECOLI.
By finding the first operon by position we reach: biocyc.org/ECOLI/NEW-IMAGE?object=TU0-42486.
That page lists several components of the promoter, which we should try to understand!
Some of the transcription factors are proteins:
After the first gene in the codon, thrL, there is a rho-independent termination. By comparing:we understand that the presence of threonine or isoleucine variants, L-threonyl and L-isoleucyl, makes the rho-independent termination become more efficient, so the control loop is quite direct! Not sure why it cares about isoleucine as well though.
TODO which factor is actually specific to that DNA region?
Contains the gene: e. Coli K-12 MG1655 gene thrL.
Subset of the longer E. Coli K-12 MG1655 transcription unit thrLABC.
Multiple genes coding for multiple proteins in one transcription unit, e.g. e. Coli K-12 MG1655 gene thrL and e. Coli K-12 MG1655 gene thrA are both prat of the E. Coli K-12 MG1655 operon thrLABC.