E. Coli K-12 MG1655 by Ciro Santilli 32 Updated Created
NCBI taxonomy entry: www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=511145 This links to:
  • genome: www.ncbi.nlm.nih.gov/genome/?term=txid511145 From there there are links to either:
    • Download the FASTA: "Download sequences in FASTA format for genome, protein"
      For the genome, you get a compressed FASTA file with extension .fna called GCF_000005845.2_ASM584v2_genomic.fna that starts with:
      >NC_000913.3 Escherichia coli str. K-12 substr. MG1655, complete genome
      AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTG
      Using wc as in wc GCF_000005845.2_ASM584v2_genomic.fna gives 58022 lines, in Vim we see that each line is 80 characters, except for the final one which is 52. So we have 58020 * 80 + 52 = 4641652 =~ 4.6 Mbp