Ciro Santilli @cirosantilli 40

 Incoming links: Protein

100 Greatest Discoveries by the Discovery Channel (2004-2005) Updated 2025-07-16

 View more

www.imdb.com/title/tt0442715/ on IMDb

Hosted by Bill Nye.

Physics topics:

Galileo: objects of different masses fall at the same speed, hammer and feather experiment
Newton: gravity, linking locally observed falls and the movement of celestial bodies
TODO a few more
superconductivity, talk only at Fermilab accelerator, no re-enactment even...
quark, interview with Murray Gell-Mann, mentions it was "an off-beat field, one wasn't encouraged to work on that". High level blablabla obviously.
fundamental interactions, notably weak interaction and strong interaction, interview with Michio Kaku. When asked "How do we know that the weak force is there?" the answer is: "We observe radioactive decay with a Geiger counter". Oh, come on!

biology topics:

Leeuwenhoek microscope and the discovery of microorganisms, and how pond water is not dead, but teeming with life. No sample of course.
1831 Robert Brown cell nucleus in plants, and later Theodor Schwann in tadpoles. This prepared the path for the idea that "all cells come from other cells", and the there seemed to be an unifying theme to all life: the precursor to DNA discoveries. Re-enactment, yay.
1971 Carl Woese and the discovery of archaea

Genetics:

Mendel. Reenactment.
1909 Thomas Hunt Morgan with Drosophila melanogaster. Reenactment. Genes are in Chromosomes. He observed that a trait was linked to sex, and it was already known that sex was related to chromosomes.
1935 George Beadle and the one gene one enzyme hypothesis by shooting X-rays at bread mold
1942 Barbara McClintock, at Cold Spring Harbor Laboratory
1952 Hershey–Chase experiment. Determined that DNA is what transmits genetic information, not protein, by radioactive labelling both protein and DNA in two sets of bacteriophages. They observed that only the DNA radioactive material was passed forward.
Crick Watson
messenger RNA, no specific scientist, too many people worked on it, done partially with bacteriophage experiments
1968 Nirenberg genetic code
1972 Hamilton O. Smith and the discovery of restriction enzymes by observing that they were part of anti bacteriophage immune-system present in bacteria
alternative splicing
RNA interference
Human Genome Project, interview with Craig Venter.

Medicine:

blood circulation
anesthesia
X-ray
germ theory of disease, with examples from Ignaz Semmelweis and Pasteur
1796 Edward Jenner discovery of vaccination by noticing that cowpox cowpox infected subjects were immune
vitamin by observing scurvy and beriberi in sailors, confirmed by Frederick Gowland Hopkins on mice experiments
Fleming, Florey and Chain and the discovery of penicillin
Prontosil
diabetes and insulin

 Read the full article

E. Coli K-12 MG1655 Updated 2025-07-16

 View more

NCBI taxonomy entry: www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=511145 This links to:

genome: www.ncbi.nlm.nih.gov/genome/?term=txid511145 From there there are links to either:
- Download the FASTA: "Download sequences in FASTA format for genome, protein"
  For the genome, you get a compressed FASTA file with extension .fna called GCF_000005845.2_ASM584v2_genomic.fna that starts with:
  >NC_000913.3 Escherichia coli str. K-12 substr. MG1655, complete genome AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTG
  Using wc as in wc GCF_000005845.2_ASM584v2_genomic.fna gives 58022 lines, in Vim we see that each line is 80 characters, except for the final one which is 52. So we have 58020 * 80 + 52 = 4641652 =~ 4.6 Mbp
- Interactively browse the sequence on the browser viewer: "Reference genome: Escherichia coli str. K-12 substr. MG1655" which eventually leads to: www.ncbi.nlm.nih.gov/nuccore/556503834?report=graph
  If we zoom into the start, we hover over the very first gene/protein: the famous (just kidding) e. Coli K-12 MG1655 gene thrL, at position 190-255.
  The second one is the much more interesting e. Coli K-12 MG1655 gene thrA.
- Gene list, with a total of 4,629 as of 2021: www.ncbi.nlm.nih.gov/gene/?term=txid511145

KEGG entry: www.genome.jp/pathway/eco01100+M00022

BioCyc promoter database query URL: biocyc.org/group?id=:ALL-PROMOTERS&orgid=ECOLI

 Read the full article

E. Coli K-12 MG1655 gene thrA Updated 2025-07-16

 View more

UniProt entry: www.uniprot.org/uniprot/P00561.

NCBI entry: www.ncbi.nlm.nih.gov/gene/945803.

The second gene in the E. Coli K-12 MG1655 genome. Part of the E. Coli K-12 MG1655 operon thrLABC.

Part of a reaction that produces threonine.

This protein is an enzyme. The UniProt entry clearly shows the chemical reactions that it catalyses. In this case, there are actually two! It can either transforming the metabolite:

"L-homoserine" into "L-aspartate 4-semialdehyde"
"L-aspartate" into "4-phospho-L-aspartate"

Also interestingly, we see that both of those reaction require some extra energy to catalyse, one needing adenosine triphosphate and the other nADP+.

TODO: any mention of how much faster it makes the reaction, numerically?

Since this is an enzyme, it would also be interesting to have a quick search for it in the KEGG entry starting from the organism: www.genome.jp/pathway/eco01100+M00022 We type in the search bar "thrA", it gives a long list, but the last entry is our "thrA". Selecting it highlights two pathways in the large graph, so we understand that it catalyzes two different reactions, as suggested by the protein name itself (fused blah blah). We can now hover over:

the edge: it shows all the enzymes that catalyze the given reaction. Both edges actually have multiple enzymes, e.g. the L-Homoserine path is also catalyzed by another enzyme called metL.
the node: they are the metabolites, e.g. one of the paths contains "L-homoserine" on one node and "L-aspartate 4-semialdehyde"

Note that common cofactor are omitted, since we've learnt from the UniProt entry that this reaction uses ATP.

If we can now click on the L-Homoserine edge, it takes us to: www.genome.jp/entry/eco:b0002+eco:b3940. Under "Pathway" we see an interesting looking pathway "Glycine, serine and threonine metabolism": www.genome.jp/pathway/eco00260+b0002 which contains a small manually selected and extremely clearly named subset of the larger graph!

But looking at the bottom of this subgraph (the UI is not great, can't Ctrl+F and enzyme names not shown, but the selected enzyme is slightly highlighted in red because it is in the URL www.genome.jp/pathway/eco00260+b0002 vs www.genome.jp/pathway/eco00260) we clearly see that thrA, thrB and thrC for a sequence that directly transforms "L-aspartate 4-semialdehyde" into "Homoserine" to "O-Phospho-L-homoserine" and finally tothreonine. This makes it crystal clear that they are not just located adjacently in the genome by chance: they are actually functionally related, and likely controlled by the same transcription factor: when you want one of them, you basically always want the three, because you must be are lacking threonine. TODO find transcription factor!

The UniProt entry also shows an interactive browser of the tertiary structure of the protein. We note that there are currently two sources available: X-ray crystallography and AlphaFold. To be honest, the AlphaFold one looks quite off!!!

By inspecting the FASTA for the entire genome, or by using the NCBI open reading frame tool, we see that this gene lies entirely in its own open reading frame, so it is quite boring

From the FASTA we see that the very first three Codons at position 337 are

ATG CGA GTG

where ATG is the start codon, and CGA GTG should be the first two that actually go into the protein:

CGA: arginine
GTG: valine

ecocyc.org/gene?orgid=ECOLI&id=ASPKINIHOMOSERDEHYDROGI-MONOMER mentions that the enzime is most active as protein complex with four copies of the same protein:

Aspartate kinase I / homoserine dehydrogenase I comprises a dimer of ThrA dimers. Although the dimeric form is catalytically active, the binding equilibrium dramatically favors the tetrameric form. The aspartate kinase and homoserine dehydrogenase activities of each ThrA monomer are catalyzed by independent domains connected by a linker region.

TODO image?

 Read the full article

E. Coli K-12 MG1655 operon thrLABC Updated 2025-07-16

 View more

Contains the genes: e. Coli K-12 MG1655 gene thrL, e. Coli K-12 MG1655 gene thrA, e. Coli K-12 MG1655 gene thrB and e. Coli K-12 MG1655 gene thrC, all of which have directly linked functionality.

We can find it by searching for the species in the BioCyc promoter database. This leads to: biocyc.org/group?id=:ALL-PROMOTERS&orgid=ECOLI.

By finding the first operon by position we reach: biocyc.org/ECOLI/NEW-IMAGE?object=TU0-42486.

That page lists several components of the promoter, which we should try to understand!

Some of the transcription factors are proteins:

After the first gene in the codon, thrL, there is a rho-independent termination. By comparing:

we understand that the presence of threonine or isoleucine variants, L-threonyl and L-isoleucyl, makes the rho-independent termination become more efficient, so the control loop is quite direct! Not sure why it cares about isoleucine as well though.

TODO which factor is actually specific to that DNA region?

 Read the full article

E. Coli K-12 MG1655 promoter Updated 2025-07-16

 View more

biocyc.org/group?id=:ALL-PROMOTERS&orgid=ECOLI

From this we see that there is a convention of naming promoters as protein name + p, e.g. the first gene in E. Coli K-12 MG1655 promoter thrLp encodes protein thrL.

It is also possible to add numbers after the p, e.g. at biocyc.org/ECOLI/NEW-IMAGE?type=OPERON&object=PM0-45989 we see that the protein zur has two promoters:

zurp6
zurp7

TODO why 6 and 7? There don't appear to be 1, 2, etc.

 Read the full article

E. Coli Whole Cell Model by Covert Lab / Mass fraction summary plot analysis Created 2024-12-04 Updated 2025-07-16

 View more

Let's look into a sample plot, out/manual/plotOut/svg_plots/massFractionSummary.svg, and try to understand as much as we can about what it means and how it was generated.

This plot contains how much of each type of mass is present in all cells. Since we simulated just one cell, it will be the same as the results for that cell.

We can see that all of them grow more or less linearly, perhaps as the start of an exponential. We can see that all of them grow more or less linearly, perhaps as the start of an exponential. We can see that all of them grow more or less linearly, perhaps as the start of an exponential.

total dry mass (mass excluding water)
protein mass
rRNA mass
mRNA mass
DNA mass. The last label is not very visible on the plots, but we can deduce it from the source code.

By grepping the title "Cell mass fractions" in the source code, we see the files:

models/ecoli/analysis/cohort/massFractionSummary.py
models/ecoli/analysis/multigen/massFractionSummary.py
models/ecoli/analysis/variant/massFractionSummary.py

which must correspond to the different massFractionSummary plots throughout different levels of the hierarchy.

By reading models/ecoli/analysis/variant/massFractionSummary.py a little bit, we see that:

the plotting is done with Matplotlib, hurray
it is reading its data from files under ./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/, more precisely ./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/columns/<column-name>/data. They are binary files however.
Looking at the source for wholecell/io/tablereader.py shows that those are just a standard NumPy serialization mechanism. Maybe they should have used the Hierarchical Data Format instead.
We can also take this opportunity to try and find where the data is coming from. Mass from the ./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/ looks like an ID, so we grep that and we reach models/ecoli/listeners/mass.py.
From this we understand that all data that is to be saved from a simulation must be coming from listeners: likely nothing, or not much, is dumped by default, because otherwise it would take up too much disk space. You have to explicitly say what it is that you want to save via a listener that acts on each time step.

Figure 1.
Minimal condition mass fraction plot
. Source. File name: `out/manual/plotOut/svg_plots/massFractionSummary.svg`

More plot types will be explored at time series run variant, where we will contrast two runs with different growth mediums.

 Read the full article

E. Coli Whole Cell Model by Covert Lab / Source code overview Updated 2025-07-16

 View more

The key model database is located in the source code at reconstruction/ecoli/flat.

Let's try to understand some interesting looking, with a special focus on our understanding of the tiny E. Coli K-12 MG1655 operon thrLABC part of the metabolism, which we have well understood at Section "E. Coli K-12 MG1655 operon thrLABC".

We'll realize that a lot of data and IDs come from/match BioCyc quite closely.

reconstruction/ecoli/flat/compartments.tsv contains cellular compartment information:
```
"abbrev" "id"
"n" "CCO-BAC-NUCLEOID"
"j" "CCO-CELL-PROJECTION"
"w" "CCO-CW-BAC-NEG"
"c" "CCO-CYTOSOL"
"e" "CCO-EXTRACELLULAR"
"m" "CCO-MEMBRANE"
"o" "CCO-OUTER-MEM"
"p" "CCO-PERI-BAC"
"l" "CCO-PILUS"
"i" "CCO-PM-BAC-NEG"
```
- CCO: "Celular COmpartment"
- BAC-NUCLEOID: nucleoid
- CELL-PROJECTION: cell projection
- CW-BAC-NEG: TODO confirm: cell wall (of a Gram-negative bacteria)
- CYTOSOL: cytosol
- EXTRACELLULAR: outside the cell
- MEMBRANE: cell membrane
- OUTER-MEM: bacterial outer membrane
- PERI-BAC: periplasm
- PILUS: pilus
- PM-BAC-NEG: TODO: plasma membrane, but that is the same as cell membrane no?
reconstruction/ecoli/flat/promoters.tsv contains promoter information. Simple file, sample lines:
```
"position" "direction" "id" "name"
148 "+" "PM00249" "thrLp"
```
corresponds to E. Coli K-12 MG1655 promoter thrLp, which starts as position 148.
reconstruction/ecoli/flat/proteins.tsv contains protein information. Sample line corresponding to e. Coli K-12 MG1655 gene thrA:
```
"aaCount" "name" "seq" "comments" "codingRnaSeq" "mw" "location" "rnaId" "id" "geneId"
[91, 46, 38, 44, 12, 53, 30, 63, 14, 46, 89, 34, 23, 30, 29, 51, 34, 4, 20, 0, 69] "ThrA" "MRVL..." "Location information from Ecocyc dump." "AUGCGAGUGUUG..." [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 89103.51099999998, 0.0, 0.0, 0.0, 0.0] ["c"] "EG10998_RNA" "ASPKINIHOMOSERDEHYDROGI-MONOMER" "EG10998"
```
so we understand that:
- aaCount: amino acid count, how many of each of the 20 proteinogenic amino acid are there
- seq: full sequence, using the single letter abbreviation of the proteinogenic amino acids
- mw; molecular weight? The 11 components appear to be given at reconstruction/ecoli/flat/scripts/unifyBulkFiles.py:
  molecular_weight_keys = [ '23srRNA', '16srRNA', '5srRNA', 'tRNA', 'mRNA', 'miscRNA', 'protein', 'metabolite', 'water', 'DNA', 'RNA' # nonspecific RNA ]
  so they simply classify the weight? Presumably this exists for complexes that have multiple classes?
  - 23srRNA, 16srRNA, 5srRNA are the three structural RNAs present in the ribosome: 23S ribosomal RNA, 16S ribosomal RNA, 5S ribosomal RNA, all others are obvious:
  - tRNA
  - mRNA
  - protein. This is the seventh class, and this enzyme only contains mass in this class as expected.
  - metabolite
  - water
  - DNA
  - RNA: TODO rna vs miscRNA
- location: cell compartment where the protein is present, c defined at reconstruction/ecoli/flat/compartments.tsv as cytoplasm, as expected for something that will make an amino acid
reconstruction/ecoli/flat/rnas.tsv: TODO vs transcriptionUnits.tsv. Sample lines:
```
"halfLife" "name" "seq" "type" "modifiedForms" "monomerId" "comments" "mw" "location" "ntCount" "id" "geneId" "microarray expression"
174.0 "ThrA [RNA]" "AUGCGAGUGUUG..." "mRNA" [] "ASPKINIHOMOSERDEHYDROGI-MONOMER" "" [0.0, 0.0, 0.0, 0.0, 790935.00399999996, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ["c"] [553, 615, 692, 603] "EG10998_RNA" "EG10998" 0.0005264904
```
- halfLife: half-life
- mw: molecular weight, same as in reconstruction/ecoli/flat/proteins.tsv. This molecule only have weight in the mRNA class, as expected, as it just codes for a protein
- location: same as in reconstruction/ecoli/flat/proteins.tsv
- ntCount: nucleotide count for each of the ATGC
- microarray expression: presumably refers to DNA microarray for gene expression profiling, but what measure exactly?

reconstruction/ecoli/flat/sequence.fasta: FASTA DNA sequence, first two lines:

>E. coli K-12 MG1655 U00096.2 (1 to 4639675 = 4639675 bp)
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTG

reconstruction/ecoli/flat/transcriptionUnits.tsv: transcription units. We can observe for example the two different transcription units of the E. Coli K-12 MG1655 operon thrLABC in the lines:
```
"expression_rate" "direction" "right" "terminator_id"  "name"    "promoter_id" "degradation_rate" "id"       "gene_id"                                   "left"
0.0               "f"         310     ["TERM0-1059"]   "thrL"    "PM00249"     0.198905992329492 "TU0-42486" ["EG11277"]                                  148
657.057317358791  "f"         5022    ["TERM_WC-2174"] "thrLABC" "PM00249"     0.231049060186648 "TU00178"   ["EG10998", "EG10999", "EG11000", "EG11277"] 148
```
- promoter_id: matches promoter id in reconstruction/ecoli/flat/promoters.tsv
- gene_id: matches id in reconstruction/ecoli/flat/genes.tsv
- id: matches exactly those used in BioCyc, which is quite nice, might be more or less standardized:
  - biocyc.org/ECOLI/NEW-IMAGE?object=TU0-42486
  - biocyc.org/ECOLI/NEW-IMAGE?type=OPERON&object=TU00178

reconstruction/ecoli/flat/genes.tsv

"length" "name"                      "seq"             "rnaId"      "coordinate" "direction" "symbol" "type" "id"      "monomerId"
66       "thr operon leader peptide" "ATGAAACGCATT..." "EG11277_RNA" 189         "+"         "thrL"   "mRNA" "EG11277" "EG11277-MONOMER"
2463     "ThrA"                      "ATGCGAGTGTTG"    "EG10998_RNA" 336         "+"         "thrA"   "mRNA" "EG10998" "ASPKINIHOMOSERDEHYDROGI-MONOMER"

reconstruction/ecoli/flat/metabolites.tsv contains metabolite information. Sample lines:
```
"id"                       "mw7.2" "location"
"HOMO-SER"                 119.12  ["n", "j", "w", "c", "e", "m", "o", "p", "l", "i"]
"L-ASPARTATE-SEMIALDEHYDE" 117.104 ["n", "j", "w", "c", "e", "m", "o", "p", "l", "i"]
```
In the case of the enzyme thrA, one of the two reactions it catalyzes is "L-aspartate 4-semialdehyde" into "Homoserine".
Starting from the enzyme page: biocyc.org/gene?orgid=ECOLI&id=EG10998 we reach the reaction page: biocyc.org/ECOLI/NEW-IMAGE?type=REACTION&object=HOMOSERDEHYDROG-RXN which has reaction ID HOMOSERDEHYDROG-RXN, and that page which clarifies the IDs:
- biocyc.org/compound?orgid=ECOLI&id=L-ASPARTATE-SEMIALDEHYDE: "L-aspartate 4-semialdehyde" has ID L-ASPARTATE-SEMIALDEHYDE
- biocyc.org/compound?orgid=ECOLI&id=HOMO-SER: "Homoserine" has ID HOMO-SER
so these are the compounds that we care about.

reconstruction/ecoli/flat/reactions.tsv contains chemical reaction information. Sample lines:

"reaction id" "stoichiometry" "is reversible" "catalyzed by"

"HOMOSERDEHYDROG-RXN-HOMO-SER/NAD//L-ASPARTATE-SEMIALDEHYDE/NADH/PROTON.51."
  {"NADH[c]": -1, "PROTON[c]": -1, "HOMO-SER[c]": 1, "L-ASPARTATE-SEMIALDEHYDE[c]": -1, "NAD[c]": 1}
  false
  ["ASPKINIIHOMOSERDEHYDROGII-CPLX", "ASPKINIHOMOSERDEHYDROGI-CPLX"]

"HOMOSERDEHYDROG-RXN-HOMO-SER/NADP//L-ASPARTATE-SEMIALDEHYDE/NADPH/PROTON.53."
  {"NADPH[c]": -1, "NADP[c]": 1, "PROTON[c]": -1, "L-ASPARTATE-SEMIALDEHYDE[c]": -1, "HOMO-SER[c]": 1
  false
  ["ASPKINIIHOMOSERDEHYDROGII-CPLX", "ASPKINIHOMOSERDEHYDROGI-CPLX"]

catalized by: here we see ASPKINIHOMOSERDEHYDROGI-CPLX, which we can guess is a protein complex made out of ASPKINIHOMOSERDEHYDROGI-MONOMER, which is the ID for the thrA we care about! This is confirmed in complexationReactions.tsv.

reconstruction/ecoli/flat/complexationReactions.tsv contains information about chemical reactions that produce protein complexes:
```
"process" "stoichiometry" "id" "dir"
"complexation"
  [
    {
      "molecule": "ASPKINIHOMOSERDEHYDROGI-CPLX",
      "coeff": 1,
      "type": "proteincomplex",
      "location": "c",
      "form": "mature"
    },
    {
      "molecule": "ASPKINIHOMOSERDEHYDROGI-MONOMER",
      "coeff": -4,
      "type": "proteinmonomer",
      "location": "c",
      "form": "mature"
    }
  ]
"ASPKINIHOMOSERDEHYDROGI-CPLX_RXN"
1
```
The coeff is how many monomers need to get together for form the final complex. This can be seen from the Summary section of ecocyc.org/gene?orgid=ECOLI&id=ASPKINIHOMOSERDEHYDROGI-MONOMER:
Aspartate kinase I / homoserine dehydrogenase I comprises a dimer of ThrA dimers. Although the dimeric form is catalytically active, the binding equilibrium dramatically favors the tetrameric form. The aspartate kinase and homoserine dehydrogenase activities of each ThrA monomer are catalyzed by independent domains connected by a linker region.
Fantastic literature summary! Can't find that in database form there however.

reconstruction/ecoli/flat/proteinComplexes.tsv contains protein complex information:

"name" "comments" "mw" "location" "reactionId" "id"
"aspartate kinase / homoserine dehydrogenase"
""
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 356414.04399999994, 0.0, 0.0, 0.0, 0.0]
["c"]
"ASPKINIHOMOSERDEHYDROGI-CPLX_RXN"
"ASPKINIHOMOSERDEHYDROGI-CPLX"

reconstruction/ecoli/flat/protein_half_lives.tsv contains the half-life of proteins. Very few proteins are listed however for some reason.

reconstruction/ecoli/flat/tfIds.csv: transcription factors information:

"TF"   "geneId"  "oneComponentId"  "twoComponentId" "nonMetaboliteBindingId" "activeId" "notes"
"arcA" "EG10061" "PHOSPHO-ARCA"    "PHOSPHO-ARCA"
"fnr"  "EG10325" "FNR-4FE-4S-CPLX" "FNR-4FE-4S-CPLX"
"dksA" "EG10230"

 Read the full article

Enzyme Updated 2025-07-16

 View more

A protein that is a catalyst for some chemical reaction.

For an initial concrete example, consider e. Coli K-12 MG1655 gene thrA.

Video 1.

How Enzymes Work by RCSBProteinDataBank (2017)

Source. Shows in detail how aconitase catalyses the citrate to isocitrate reaction in the citric acid cycle.

 Read the full article

History of X-ray crystallography Updated 2025-07-16

 View more

1958: myoglobin structure resolution (1958). The first protein to be resolved.
1965: lysozyme structure resolution (1965). The second protein to be resolved.

 Read the full article

Lysozyme structure resolution (1965) Updated 2025-07-16

 View more

With X-ray crystallography by David Chilton Phillips. The second protein to be resolved fter after myoglobin, and the first enzyme.

Published at: Structure of Hen Egg-White Lysozyme: A Three-dimensional Fourier Synthesis at 2 Å Resolution (1965). The work was done while at the Davy Faraday Research Laboratory of the Royal Institution.

Phillips also published a lower resolution (6 angstrom) of the enzyme-inhibitor complexes at about the same time: Structure of Some Crystalline Lysozyme-Inhibitor Complexes Determined by X-Ray Analysis At 6 Å Resolution (1965). The point of doing this is that it points out the active site of the enzyme.

 Read the full article

Molecular biology technologies Updated 2025-07-16

 View more

As of 2019, the silicon industry is ending, and molecular biology technology is one of the most promising and growing field of engineering.

Figure 1.
42 years of microprocessor trend data by Karl Rupp
. Source. Only transistor count increases, which also pushes core counts up. But what you gonna do when atomic limits are reached? The separation between two silicon atoms is 0.23nm and 2019 technology is at 5nm scale.

Such advances could one day lead to both biological super-AGI and immortality.

Ciro Santilli is especially excited about DNA-related technologies, because DNA is the centerpiece of biology, and it is programmable.

First, during the 2000's, the cost of DNA sequencing fell to about 1000 USD per genome in the end of the 2010's: Figure 2. "Cost per genome vs Moore's law from 2000 to 2019", largely due to "Illumina's" technology.

The medical consequences of this revolution are still trickling down towards medical applications of 2019, inevitably, but somewhat slowly due to tight privacy control of medical records.

Ciro Santilli predicts that when the 100 dollar mark is reached, every person of the First world will have their genome sequenced, and then medical applications will be closer at hand than ever.

But even 100 dollars is not enough. Sequencing power is like computing power: humankind can never have enough. Sequencing is not a one per person thing. For example, as of 2019 tumors are already being sequenced to help understand and treat them, and scientists/doctors will sequence as many tumor cells as budget allows.

Then, in the 2010's, CRISPR/Cas9 gene editing started opening up the way to actually modifying the genome that we could now see through sequencing.

What's next?

Ciro believes that the next step in the revolution could be could be: de novo DNA synthesis.

This technology could be the key to the one of the ultimate dream of biologists: cheap programmable biology with push-button organism bootstrap!

Just imagine this: at the comfort of your own garage, you take some model organism of interest, maybe start humble with Escherichia coli. Then you modify its DNA to your liking, and upload it to a 3D printer sized machine on your workbench, which automatically synthesizes the DNA, and injects into a bootstrapped cell.

You then make experiments to check if the modified cell achieves your desired new properties, e.g. production of some protein, and if not reiterate, just like a software engineer.

Of course, even if we were able to do the bootstrap, the debugging process then becomes key, as visibility is the key limitation of biology, maybe we need other cheap technologies to come in at that point.

This a place point we see the beauty of evolution the brightest: evolution does not require observability. But it also implies that if your changes to the organism make it less fit, then your mutation will also likely be lost. This has to be one of the considerations done when designing your organism.