Ciro Santilli @cirosantilli 40

 Incoming links: Amino acid

Aminoacyl tRNA synthetase Updated 2025-07-16

Binds an amino acid to the correct corresponding tRNA sequence. Wikipedia mentions that humans have 20 of them, one for each proteinogenic amino acid.

 Read the full article

The best articles by Ciro Santilli Updated 2025-07-16

 View more

These are the best articles ever authored by Ciro Santilli, most of them in the format of Stack Overflow answers.

Ciro posts update about new articles on his Twitter accounts.

A chronological list of all articles is also kept at: Section "Updates".

Some random generally less technical in-tree essays will be present at: Section "Essays by Ciro Santilli".

Trended on Hacker News:
- CIA 2010 covert communication websites on 2023-06-11. 190 points, a mild success.
- x86 Bare Metal Examples on 2019-03-19. 513 points. The third time something related to that repo trends. Hacker news people really like that repo!
  - again 2020-06-27 (archive). 200 points, repository traffic jumped from 25 daily unique visitors to 4.6k unique visitors on the day
- How to run a program without an operating system? on 2018-11-26 (archive). 394 points. Covers x86 and ARM
- ELF Hello World Tutorial on 2017-05-17 (archive). 334 points.
- x86 Paging Tutorial on 2017-03-02. Number 1 Google search result for "x86 Paging" in 2017-08. 142 points.
  Figure 1.
  BIOS bare metal hello world running on a Lenovo ThinkPad T430
  . Source.
x86 assembly
- What does "multicore" assembly language look like?
- What is the function of the push / pop instructions used on registers in x86 assembly? Going down to memory spills, register allocation and graph coloring.
Linux kernel
QEMU
- How to add a new device in QEMU source code?
- How to generate Ubuntu debootstrap disk images for QEMU?
- How to create a multi partition SD disk image without root privileges?
- Figure 4.
  Ubuntu 18.04 running inside QEMU
  . Source. From: How to run Ubuntu desktop on QEMU?
gcc and Binutils:
- How do linkers and address relocation works?
- What is incremental linking or partial linking?
- GOLD (-fuse-ld=gold) linker vs the traditional GNU ld and LLVM ldd
- What is the -fPIE option for position-independent executables in GCC and ld? Concrete examples by running program through GDB twice, and an assembly hello world with absolute vs PC relative load.
- How many GCC optimization levels are there?
- Why does GCC create a shared object instead of an executable binary according to file?
C/C++: almost all of those fall into "disassemble all the things" category. Ciro also does "standards dissection" and "a new version of the standard is out" answers, but those are boring:
- What does "static" mean in a C program?
- In C++ source, what is the effect of extern "C"?
- Char array vs Char Pointer in C
- How to compile glibc from source and use it?
- When should static_cast, dynamic_cast, const_cast and reinterpret_cast be used?
- What exactly is std::atomic in C++?. This answer was originally more appropriately entitled "Let's disassemble some stuff", and got three downvotes, so Ciro changed it to a more professional title, and it started getting upvotes. People judge books by their covers.
- notmain.o 0000000000000000 0000000000000017 W MyTemplate<int>::f(int) main.o 0000000000000000 0000000000000017 W MyTemplate<int>::f(int)
  Code 1.
  nm outputs showing that objects are redefined multiple times across files if you don't use template instantiation properly
  . From: What is explicit template instantiation in C++ and when to use it?

IEEE 754

What is difference between quiet NaN and signaling NaN?
In Java, what does NaN mean?

Without subnormals:

          +---+---+-------+---------------+-------------------------------+
exponent  | ? | 0 |   1   |       2       |               3               |
          +---+---+-------+---------------+-------------------------------+
          |   |   |       |               |                               |
          v   v   v       v               v                               v
          -----------------------------------------------------------------
floats    *    **** * * * *   *   *   *   *       *       *       *       *
          -----------------------------------------------------------------
          ^   ^   ^       ^               ^                               ^
          |   |   |       |               |                               |
          0   |   2^-126  2^-125          2^-124                          2^-123
              |
              2^-127

With subnormals:

          +-------+-------+---------------+-------------------------------+
exponent  |   0   |   1   |       2       |               3               |
          +-------+-------+---------------+-------------------------------+
          |       |       |               |                               |
          v       v       v               v                               v
          -----------------------------------------------------------------
floats    * * * * * * * * *   *   *   *   *       *       *       *       *
          -----------------------------------------------------------------
          ^   ^   ^       ^               ^                               ^
          |   |   |       |               |                               |
          0   |   2^-126  2^-125          2^-124                          2^-123
              |
              2^-127

Code 2.

Visualization of subnormal floating point numbers vs what IEEE 754 would look like without them

. From: What is a subnormal floating point number?

Computer science
- Algorithms
  - Figure 5.
    Average insertion time into heaps, binary search tree and hash maps of the C++ standard library
    . Source. From: Heap vs Binary Search Tree (BST)
- Is it necessary for NP problems to be decision problems?
- Polynomial time and exponential time. Answered focusing on the definition of "exponential time".
- What is the smallest Turing machine where it is unknown if it halts or not?. Answer focusing on "blank tape" initial condition only. Large parts of it are summarizing the Busy Beaver Challenge, but some additions were made.

Git

  | 0           | 4            | 8           | C              |
  |-------------|--------------|-------------|----------------|
0 | DIRC        | Version      | File count  | ctime       ...| 0
  | ...         | mtime                      | device         |
2 | inode       | mode         | UID         | GID            | 2
  | File size   | Entry SHA-1                              ...|
4 | ...                        | Flags       | Index SHA-1 ...| 4
  | ...                                                       |

Code 3.

ASCII art depicting the binary file format of the Git index file

. From: What does the git index contain EXACTLY?

tree {tree_sha}
{parents}
author {author_name} <{author_email}> {author_date_seconds} {author_date_timezone}
committer {committer_name} <{committer_email}> {committer_date_seconds} {committer_date_timezone}

{commit message}

Code 4.

Description of the Git commit object binary data structure

. From: What is the file format of a git commit object data structure?

How do I clone a subdirectory only of a Git repository?

Python
- What is the difference between old style and new style classes in Python?
- What is a mixin in Python, and why are they useful?
- What are the differences between threads and processes in Python?
  Figure 6.
  Python Threads vs Processes with 8 hyperthreads
  . Source.
Web technology
- What does enctype='multipart/form-data' mean?
- JavaScript
  - How does JavaScript .prototype work?
  - What is the difference between .prop() vs .attr() in JavaScript?
OpenGL
- Figure 7.
  OpenGL rendering output dumped to a GIF file
  . Source. From: How to use GLUT/OpenGL to render to a file?
- Figure 8.
  Example of a texture atlas containing glyphs
  . Source.
  Image by Nicolas P. Rougier, author of Freetype GL.
  Used on Ciro Santilli's answer: How to draw text using only OpenGL methods?
- Figure 9.
  OpenGL glFrustrum vs glOrtho
  . Source. From: How to use glOrtho() in OpenGL?
- What are shaders in OpenGL?
- Why do we use 4x4 matrices to transform things in 3D?
- Figure 10.
  Sinusoidal circular wave heatmap generated with an OpenGL shader at 60 FPS on SDL
  . Source.
  From: Is it possible to build a heatmap from point data at 60 times per second?
  Compared CPU vs GPU shaders.
- Image Processing with GLSL shaders? Compared the CPU and GPU for a simple blur algorithm.
  Figure 11. Source.
  Video 1.
  OpenGL GPU GLSL fragment shader real time v4l2 Linux webcam computer vision box blur vs CPU
  . Source.
Node.js
- What's the difference between dependencies, devDependencies and peerDependencies in npm package.json file?
Ruby on Rails
- What is the difference between +<%+, +<%=+, +<%#+ and +-%>+ in ERB in Rails?
POSIX
- What is POSIX? Huge classified overview of the most important things that POSIX specifies.

Systems programming

What do the terms "CPU bound" and "I/O bound" mean?
Figure 12.
Plot of "real", "user" and "sys" mean times of the output of time for CPU-bound workload with 8 threads
. Source. From: What do 'real', 'user' and 'sys' mean in the output of time?

+--------+                  +------------+       +------+
| device |>---------------->| function 0 |>----->| BAR0 |
|        |                  |            |       +------+
|        |>------------+    |            |
|        |             |    |            |       +------+
   ...        ...      |    |            |>----->| BAR1 |
|        |             |    |            |       +------+
|        |>--------+   |    |            |
+--------+         |   |         ...        ...    ...
                   |   |    |            |
                   |   |    |            |       +------+
                   |   |    |            |>----->| BAR5 |
                   |   |    +------------+       +------+
                   |   |
                   |   |
                   |   |    +------------+       +------+
                   |   +--->| function 1 |>----->| BAR0 |
                   |        |            |       +------+
                   |        |            |
                   |        |            |       +------+
                   |        |            |>----->| BAR1 |
                   |        |            |       +------+
                   |        |            |
                   |             ...        ...    ...
                   |        |            |
                   |        |            |       +------+
                   |        |            |>----->| BAR5 |
                   |        +------------+       +------+
                   |
                   |
                   |             ...
                   |
                   |
                   |        +------------+       +------+
                   +------->| function 7 |>----->| BAR0 |
                            |            |       +------+
                            |            |
                            |            |       +------+
                            |            |>----->| BAR1 |
                            |            |       +------+
                            |            |
                                 ...        ...    ...
                            |            |
                            |            |       +------+
                            |            |>----->| BAR5 |
                            +------------+       +------+

Code 5.

Logical struture PCIe device, functions and BARs

. From: What is the Base Address Register (BAR) in PCIe?

Electronics
- Raspberry Pi
  - Figure 13.
    Raspberry Pi 2 directly connected to a laptop with an Ethernet cable
    . Image from answer to: How to hook up a Raspberry Pi via Ethernet to a laptop without a router?
    Figure 14.
    Raspberry Pi 2 connected to a laptop with an USB UART adapter
    . Image from answer to: How to hook up a Raspberry Pi via Ethernet to a laptop without a router?
    Figure 15.
    Raspberry Pi OS being emulated on QEMU 2.5.0 on Ubuntu 16.04 with a modified kernel
    . Image from answer to: How to emulate the Raspberry Pi 2 on QEMU?
    Figure 16.
    Bare metal LED blinker program running on a Raspberry Pi 2
    . Image from answer to: How to run a C program with no OS on the Raspberry Pi?
Computer security
- Why is the same origin policy so important?
Media
- Video 2.
  Canon in D in C
  . Source.
  From: How is audio represented with numbers in computers?.
  The original question was deleted, lol...: How to programmatically synthesize music?
- How to resize a picture using ffmpeg's sws_scale()?
- Is there any decent speech recognition software for Linux? ran a few examples manually on vosk-api and compared to ground truth.
Eclipse
- How to set up the Eclipse for remote C debugging with gdbserver?
Computer hardware
- Are there good open source standard cell libraries to learn IC synthesis with EDA tools?
Scientific visualization software
- Figure 17.
  VisIt zoom in 10 million straight line plot with some manually marked points
  . Source. From: Section "Survey of open source interactive plotting software with a 10 million point scatter plot benchmark by Ciro Santilli"
Numerical analysis
- Video 3.
  Real-time heat equation OpenGL visualization with interactive mouse cursor using relaxation method by Ciro Santilli (2016)
  Source.
Computational physics
- Figure 18.
  gnuplot plot of the y position of a sphere bouncing on a plane simulated in Bullet Physics
  . Source. From: What is the simplest collision example possible in a Bullet Physics simulation?
Register transfer level languages like Verilog and VHDL
- Verilog:
  Figure 19.
  Interacgive ASDF-controlled demo with core logic written in Verilog using Verilator
  .
  From: Is it possible to do interactive user input and output simulation in VHDL or Verilog?
  See also: Section "Verilator interactive example"
Android
- Figure 20. Source. From: How to compile the Android AOSP kernel and test it with the Android Emulator?
- Video 4.
  Android screen showing live on an Ubuntu laptop through ADB
  . Source. From: How to see the Android screen live on an Ubuntu desktop through ADB?
Debugging
Program optimization
- What is tail call optimization?
- Figure 21.
  gprof2dot image generated from the gprof data of a simple test program
  . Source.
  From: How can I profile C++ code running on Linux?
  The answer compares gprof, valgrind callgrind, perf and gperftools on a single simple executable.
Data
- Figure 22.
  Mathematics dump of Wikipedia CatTree
  . Source. In this project, Ciro Santilli explored extracting the category and article tree out of the Wikipedia dumps.
Mathematics
- Figure 23.
  Diagram of the fundamental theorem on homomorphisms by Ciro Santilli (2020)
  
  Shows the relationship between group homomorphisms and normal subgroups.
  From: What is the intuition behind normal subgroups?
- Section "Formalization of mathematics": some early thoughts that could be expanded. Ciro almost had a stroke when he understood this stuff in his teens.
- Figure 24.
  Simple example of the Discrete Fourier transform
  . Source. That was missing from Wikipedia page: en.wikipedia.org/wiki/Discrete_Fourier_transform!
Network programming
- How to make an HTTP get request in C without libcurl?
Physics
- What is the difference between plutonium and uranium?
- Figure 25.
  Spacetime diagram illustrating how faster-than-light travel implies time travel
  . From: Does faster than light travel imply travelling back in time?
Biology
- Figure 26.
  Top view of an open Oxford Nanopore MinION
  . Source. From: Section "How to use an Oxford Nanopore MinION to extract DNA from river water and determine which bacteria live in it"
- Figure 27.
  Mass fractions in a minimal growth medium vs an amino acid cut in a simulation of the E. Coli Whole Cell Model by Covert Lab
  . Source. From: Section "E. Coli Whole Cell Model by Covert Lab"
Quantum computing
- Section "Quantum computing is just matrix multiplication"
- Figure 28.
  Visualization of the continuous deformation of states as we walk around the Bloch sphere represented as photon polarization arrows
  . From: Understanding the Bloch sphere.
Bitcoin
- Section "Cool data embedded in the Bitcoin blockchain"
GIMP
- Figure 29.
  GIMP screenshot part of how to combine two images side-by-side in GIMP?
Home DIY
- Figure 30.
  Total_Blackout_Cassette_Roller_Blind_With_Curtains.
  Source. From: Section "How to blackout your window without drilling"
China
- What would happen if I walked around Beijing with a t-shirt that said "freedom of speech is pretty great"?

 Read the full article

E. Coli Whole Cell Model by Covert Lab / Condition Updated 2025-07-16

 View more

reconstruction/ecoli/flat/condition/nutrient/minimal.tsv contains the nutrients in a minimal environment in which the cell survives:

"molecule id" "lower bound (units.mmol / units.g / units.h)" "upper bound (units.mmol / units.g / units.h)"
"ADP[c]" 3.15 3.15
"PI[c]" 3.15 3.15
"PROTON[c]" 3.15 3.15
"GLC[p]" NaN 20
"OXYGEN-MOLECULE[p]" NaN NaN
"AMMONIUM[c]" NaN NaN
"PI[p]" NaN NaN
"K+[p]" NaN NaN
"SULFATE[p]" NaN NaN
"FE+2[p]" NaN NaN
"CA+2[p]" NaN NaN
"CL-[p]" NaN NaN
"CO+2[p]" NaN NaN
"MG+2[p]" NaN NaN
"MN+2[p]" NaN NaN
"NI+2[p]" NaN NaN
"ZN+2[p]" NaN NaN
"WATER[p]" NaN NaN
"CARBON-DIOXIDE[p]" NaN NaN
"CPD0-1958[p]" NaN NaN
"L-SELENOCYSTEINE[c]" NaN NaN
"GLC-D-LACTONE[c]" NaN NaN
"CYTOSINE[c]" NaN NaN

If we compare that to reconstruction/ecoli/flat/condition/nutrient/minimal_plus_amino_acids.tsv, we see that it adds the 20 amino acids on top of the minimal condition:

"L-ALPHA-ALANINE[p]" NaN NaN
"ARG[p]" NaN NaN
"ASN[p]" NaN NaN
"L-ASPARTATE[p]" NaN NaN
"CYS[p]" NaN NaN
"GLT[p]" NaN NaN
"GLN[p]" NaN NaN
"GLY[p]" NaN NaN
"HIS[p]" NaN NaN
"ILE[p]" NaN NaN
"LEU[p]" NaN NaN
"LYS[p]" NaN NaN
"MET[p]" NaN NaN
"PHE[p]" NaN NaN
"PRO[p]" NaN NaN
"SER[p]" NaN NaN
"THR[p]" NaN NaN
"TRP[p]" NaN NaN
"TYR[p]" NaN NaN
"L-SELENOCYSTEINE[c]" NaN NaN
"VAL[p]" NaN NaN

so we guess that NaN in the upper mound likely means infinite.

We can try to understand the less obvious ones:

ADP: TODO
PI: TODO
PROTON[c]: presumably a measure of pH
GLC[p]: glucose, this can be seen by comparing minimal.tsv with minimal_no_glucose.tsv
AMMONIUM: ammonium. This appears to be the primary source of nitrogen atoms for producing amino acids.
CYTOSINE[c]: hmmm, why is external cytosine needed? Weird.

reconstruction/ecoli/flat/reconstruction/ecoli/flat/condition/timeseries/ contains sequences of conditions for each time. For example:
- reconstruction/ecoli/flat/reconstruction/ecoli/flat/condition/timeseries/000000_basal.tsv contains:
  "time (units.s)" "nutrients" 0 "minimal"
  which means just using reconstruction/ecoli/flat/condition/nutrient/minimal.tsv until infinity. That is the default one used by runSim.py, as can be seen from ./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Environment/attributes/nutrientTimeSeriesLabel which contains just 000000_basal.
- reconstruction/ecoli/flat/reconstruction/ecoli/flat/condition/timeseries/000001_cut_glucose.tsv is more interesting and contains:
  "time (units.s)" "nutrients" 0 "minimal" 1200 "minimal_no_glucose"
  so we see that this will shift the conditions half-way to a condition that will eventually kill the bacteria because it will run out of glucose and thus energy!
Timeseries can be selected with --variant nutrientTimeSeries X Y, see also: run variants.
We can use that variant with:
```
VARIANT="condition" FIRST_VARIANT_INDEX=1 LAST_VARIANT_INDEX=1 python runscripts/manual/runSim.py
```
reconstruction/ecoli/flat/condition/condition_defs.tsv contains lines of form:
```
"condition" "nutrients"                "genotype perturbations" "doubling time (units.min)" "active TFs"
"basal"     "minimal"                  {}                       44.0                        []
"no_oxygen" "minimal_minus_oxygen"     {}                       100.0                       []
"with_aa"   "minimal_plus_amino_acids" {}                       25.0                        ["CPLX-125", "MONOMER0-162", "CPLX0-7671", "CPLX0-228", "MONOMER0-155"]
```
- condition refers to entries in reconstruction/ecoli/flat/condition/condition_defs.tsv
- nutrients refers to entries under reconstruction/ecoli/flat/condition/nutrient/, e.g. reconstruction/ecoli/flat/condition/nutrient/minimal.tsv or reconstruction/ecoli/flat/condition/nutrient/minimal_plus_amino_acids.tsv
- genotype perturbations: there aren't any in the file, but this suggests that genotype modifications can also be incorporated here
- doubling time: TODO experimental data? Because this should be a simulation output, right? Or do they cheat and fix doubling by time?
- active TFs: this suggests that they are cheating transcription factors here, as those would ideally be functions of other more basic inputs

 Read the full article

E. Coli Whole Cell Model by Covert Lab / Default run variant Updated 2025-07-16

 View more

The default run variant, if you don't pass any options, just has the minimal growth conditions set. What this means can be seen at condition.

Notably, this implies a growth medium that includes glucose and salt. It also includes oxygen, which is not strictly required, but greatly benefits cell growth, and is of course easier to have than not have as it is part of the atmosphere!

But the medium does not include amino acids, which the bacteria will have to produce by itself.

 Read the full article

E. Coli Whole Cell Model by Covert Lab / Source code overview Updated 2025-07-16

 View more

The key model database is located in the source code at reconstruction/ecoli/flat.

Let's try to understand some interesting looking, with a special focus on our understanding of the tiny E. Coli K-12 MG1655 operon thrLABC part of the metabolism, which we have well understood at Section "E. Coli K-12 MG1655 operon thrLABC".

We'll realize that a lot of data and IDs come from/match BioCyc quite closely.

reconstruction/ecoli/flat/compartments.tsv contains cellular compartment information:
```
"abbrev" "id"
"n" "CCO-BAC-NUCLEOID"
"j" "CCO-CELL-PROJECTION"
"w" "CCO-CW-BAC-NEG"
"c" "CCO-CYTOSOL"
"e" "CCO-EXTRACELLULAR"
"m" "CCO-MEMBRANE"
"o" "CCO-OUTER-MEM"
"p" "CCO-PERI-BAC"
"l" "CCO-PILUS"
"i" "CCO-PM-BAC-NEG"
```
- CCO: "Celular COmpartment"
- BAC-NUCLEOID: nucleoid
- CELL-PROJECTION: cell projection
- CW-BAC-NEG: TODO confirm: cell wall (of a Gram-negative bacteria)
- CYTOSOL: cytosol
- EXTRACELLULAR: outside the cell
- MEMBRANE: cell membrane
- OUTER-MEM: bacterial outer membrane
- PERI-BAC: periplasm
- PILUS: pilus
- PM-BAC-NEG: TODO: plasma membrane, but that is the same as cell membrane no?
reconstruction/ecoli/flat/promoters.tsv contains promoter information. Simple file, sample lines:
```
"position" "direction" "id" "name"
148 "+" "PM00249" "thrLp"
```
corresponds to E. Coli K-12 MG1655 promoter thrLp, which starts as position 148.
reconstruction/ecoli/flat/proteins.tsv contains protein information. Sample line corresponding to e. Coli K-12 MG1655 gene thrA:
```
"aaCount" "name" "seq" "comments" "codingRnaSeq" "mw" "location" "rnaId" "id" "geneId"
[91, 46, 38, 44, 12, 53, 30, 63, 14, 46, 89, 34, 23, 30, 29, 51, 34, 4, 20, 0, 69] "ThrA" "MRVL..." "Location information from Ecocyc dump." "AUGCGAGUGUUG..." [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 89103.51099999998, 0.0, 0.0, 0.0, 0.0] ["c"] "EG10998_RNA" "ASPKINIHOMOSERDEHYDROGI-MONOMER" "EG10998"
```
so we understand that:
- aaCount: amino acid count, how many of each of the 20 proteinogenic amino acid are there
- seq: full sequence, using the single letter abbreviation of the proteinogenic amino acids
- mw; molecular weight? The 11 components appear to be given at reconstruction/ecoli/flat/scripts/unifyBulkFiles.py:
  molecular_weight_keys = [ '23srRNA', '16srRNA', '5srRNA', 'tRNA', 'mRNA', 'miscRNA', 'protein', 'metabolite', 'water', 'DNA', 'RNA' # nonspecific RNA ]
  so they simply classify the weight? Presumably this exists for complexes that have multiple classes?
  - 23srRNA, 16srRNA, 5srRNA are the three structural RNAs present in the ribosome: 23S ribosomal RNA, 16S ribosomal RNA, 5S ribosomal RNA, all others are obvious:
  - tRNA
  - mRNA
  - protein. This is the seventh class, and this enzyme only contains mass in this class as expected.
  - metabolite
  - water
  - DNA
  - RNA: TODO rna vs miscRNA
- location: cell compartment where the protein is present, c defined at reconstruction/ecoli/flat/compartments.tsv as cytoplasm, as expected for something that will make an amino acid
reconstruction/ecoli/flat/rnas.tsv: TODO vs transcriptionUnits.tsv. Sample lines:
```
"halfLife" "name" "seq" "type" "modifiedForms" "monomerId" "comments" "mw" "location" "ntCount" "id" "geneId" "microarray expression"
174.0 "ThrA [RNA]" "AUGCGAGUGUUG..." "mRNA" [] "ASPKINIHOMOSERDEHYDROGI-MONOMER" "" [0.0, 0.0, 0.0, 0.0, 790935.00399999996, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ["c"] [553, 615, 692, 603] "EG10998_RNA" "EG10998" 0.0005264904
```
- halfLife: half-life
- mw: molecular weight, same as in reconstruction/ecoli/flat/proteins.tsv. This molecule only have weight in the mRNA class, as expected, as it just codes for a protein
- location: same as in reconstruction/ecoli/flat/proteins.tsv
- ntCount: nucleotide count for each of the ATGC
- microarray expression: presumably refers to DNA microarray for gene expression profiling, but what measure exactly?

reconstruction/ecoli/flat/sequence.fasta: FASTA DNA sequence, first two lines:

>E. coli K-12 MG1655 U00096.2 (1 to 4639675 = 4639675 bp)
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTG

reconstruction/ecoli/flat/transcriptionUnits.tsv: transcription units. We can observe for example the two different transcription units of the E. Coli K-12 MG1655 operon thrLABC in the lines:
```
"expression_rate" "direction" "right" "terminator_id"  "name"    "promoter_id" "degradation_rate" "id"       "gene_id"                                   "left"
0.0               "f"         310     ["TERM0-1059"]   "thrL"    "PM00249"     0.198905992329492 "TU0-42486" ["EG11277"]                                  148
657.057317358791  "f"         5022    ["TERM_WC-2174"] "thrLABC" "PM00249"     0.231049060186648 "TU00178"   ["EG10998", "EG10999", "EG11000", "EG11277"] 148
```
- promoter_id: matches promoter id in reconstruction/ecoli/flat/promoters.tsv
- gene_id: matches id in reconstruction/ecoli/flat/genes.tsv
- id: matches exactly those used in BioCyc, which is quite nice, might be more or less standardized:
  - biocyc.org/ECOLI/NEW-IMAGE?object=TU0-42486
  - biocyc.org/ECOLI/NEW-IMAGE?type=OPERON&object=TU00178

reconstruction/ecoli/flat/genes.tsv

"length" "name"                      "seq"             "rnaId"      "coordinate" "direction" "symbol" "type" "id"      "monomerId"
66       "thr operon leader peptide" "ATGAAACGCATT..." "EG11277_RNA" 189         "+"         "thrL"   "mRNA" "EG11277" "EG11277-MONOMER"
2463     "ThrA"                      "ATGCGAGTGTTG"    "EG10998_RNA" 336         "+"         "thrA"   "mRNA" "EG10998" "ASPKINIHOMOSERDEHYDROGI-MONOMER"

reconstruction/ecoli/flat/metabolites.tsv contains metabolite information. Sample lines:
```
"id"                       "mw7.2" "location"
"HOMO-SER"                 119.12  ["n", "j", "w", "c", "e", "m", "o", "p", "l", "i"]
"L-ASPARTATE-SEMIALDEHYDE" 117.104 ["n", "j", "w", "c", "e", "m", "o", "p", "l", "i"]
```
In the case of the enzyme thrA, one of the two reactions it catalyzes is "L-aspartate 4-semialdehyde" into "Homoserine".
Starting from the enzyme page: biocyc.org/gene?orgid=ECOLI&id=EG10998 we reach the reaction page: biocyc.org/ECOLI/NEW-IMAGE?type=REACTION&object=HOMOSERDEHYDROG-RXN which has reaction ID HOMOSERDEHYDROG-RXN, and that page which clarifies the IDs:
- biocyc.org/compound?orgid=ECOLI&id=L-ASPARTATE-SEMIALDEHYDE: "L-aspartate 4-semialdehyde" has ID L-ASPARTATE-SEMIALDEHYDE
- biocyc.org/compound?orgid=ECOLI&id=HOMO-SER: "Homoserine" has ID HOMO-SER
so these are the compounds that we care about.

reconstruction/ecoli/flat/reactions.tsv contains chemical reaction information. Sample lines:

"reaction id" "stoichiometry" "is reversible" "catalyzed by"

"HOMOSERDEHYDROG-RXN-HOMO-SER/NAD//L-ASPARTATE-SEMIALDEHYDE/NADH/PROTON.51."
  {"NADH[c]": -1, "PROTON[c]": -1, "HOMO-SER[c]": 1, "L-ASPARTATE-SEMIALDEHYDE[c]": -1, "NAD[c]": 1}
  false
  ["ASPKINIIHOMOSERDEHYDROGII-CPLX", "ASPKINIHOMOSERDEHYDROGI-CPLX"]

"HOMOSERDEHYDROG-RXN-HOMO-SER/NADP//L-ASPARTATE-SEMIALDEHYDE/NADPH/PROTON.53."
  {"NADPH[c]": -1, "NADP[c]": 1, "PROTON[c]": -1, "L-ASPARTATE-SEMIALDEHYDE[c]": -1, "HOMO-SER[c]": 1
  false
  ["ASPKINIIHOMOSERDEHYDROGII-CPLX", "ASPKINIHOMOSERDEHYDROGI-CPLX"]

catalized by: here we see ASPKINIHOMOSERDEHYDROGI-CPLX, which we can guess is a protein complex made out of ASPKINIHOMOSERDEHYDROGI-MONOMER, which is the ID for the thrA we care about! This is confirmed in complexationReactions.tsv.

reconstruction/ecoli/flat/complexationReactions.tsv contains information about chemical reactions that produce protein complexes:
```
"process" "stoichiometry" "id" "dir"
"complexation"
  [
    {
      "molecule": "ASPKINIHOMOSERDEHYDROGI-CPLX",
      "coeff": 1,
      "type": "proteincomplex",
      "location": "c",
      "form": "mature"
    },
    {
      "molecule": "ASPKINIHOMOSERDEHYDROGI-MONOMER",
      "coeff": -4,
      "type": "proteinmonomer",
      "location": "c",
      "form": "mature"
    }
  ]
"ASPKINIHOMOSERDEHYDROGI-CPLX_RXN"
1
```
The coeff is how many monomers need to get together for form the final complex. This can be seen from the Summary section of ecocyc.org/gene?orgid=ECOLI&id=ASPKINIHOMOSERDEHYDROGI-MONOMER:
Aspartate kinase I / homoserine dehydrogenase I comprises a dimer of ThrA dimers. Although the dimeric form is catalytically active, the binding equilibrium dramatically favors the tetrameric form. The aspartate kinase and homoserine dehydrogenase activities of each ThrA monomer are catalyzed by independent domains connected by a linker region.
Fantastic literature summary! Can't find that in database form there however.

reconstruction/ecoli/flat/proteinComplexes.tsv contains protein complex information:

"name" "comments" "mw" "location" "reactionId" "id"
"aspartate kinase / homoserine dehydrogenase"
""
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 356414.04399999994, 0.0, 0.0, 0.0, 0.0]
["c"]
"ASPKINIHOMOSERDEHYDROGI-CPLX_RXN"
"ASPKINIHOMOSERDEHYDROGI-CPLX"

reconstruction/ecoli/flat/protein_half_lives.tsv contains the half-life of proteins. Very few proteins are listed however for some reason.

reconstruction/ecoli/flat/tfIds.csv: transcription factors information:

"TF"   "geneId"  "oneComponentId"  "twoComponentId" "nonMetaboliteBindingId" "activeId" "notes"
"arcA" "EG10061" "PHOSPHO-ARCA"    "PHOSPHO-ARCA"
"fnr"  "EG10325" "FNR-4FE-4S-CPLX" "FNR-4FE-4S-CPLX"
"dksA" "EG10230"

 Read the full article

E. Coli Whole Cell Model by Covert Lab / Time series run variant Updated 2025-07-16

 View more

To modify the nutrients as a function of time, with To select a time series we can use something like:

python runscripts/manual/runSim.py --variant nutrientTimeSeries 25 25

As mentioned in python runscripts/manual/runSim.py --help, nutrientTimeSeries is one of the choices from github.com/CovertLab/WholeCellEcoliRelease/blob/7e4cc9e57de76752df0f4e32eca95fb653ea64e4/models/ecoli/sim/variants/__init__.py#L57

25 25 means to start from index 25 and also end at 25, so running just one simulation. 25 27 would run 25 then 26 and then 27 for example.

The timeseries with index 25 is reconstruction/ecoli/flat/condition/timeseries/000025_cut_aa.tsv and contains

"time (units.s)" "nutrients"
0 "minimal_plus_amino_acids"
1200 "minimal"

so we understand that it starts with extra amino acids in the medium, which benefit the cell, and half way through those are removed at time 1200s = 20 minutes. We would therefore expect the cell to start expressing amino acid production genes exactly at that point.

nutrients likely means condition in that file however, see bug report with 1 1 failing: github.com/CovertLab/WholeCellEcoliRelease/issues/24

When we do this the simulation ends in:

Simulation finished:
 - Length: 0:34:23
 - Runtime: 0:08:03

so we see that the doubling time was faster than the one with minimal conditions of 0:42:49, which makes sense, since during the first 20 minutes the cell had extra amino acid nutrients at its disposal.

The output directory now contains simulation output data under out/manual/nutrientTimeSeries_000025/. Let's run analysis and plots for that:

python runscripts/manual/analysisVariant.py &&
python runscripts/manual/analysisCohort.py --variant 25 &&
python runscripts/manual/analysisMultigen.py --variant 25 &&
python runscripts/manual/analysisSingle.py --variant 25

We can now compare the outputs of this run to the default wildtype_000000 run from Section "Install and first run".

out/manual/plotOut/svg_plots/massFractionSummary.svg: because we now have two variants in the same out/ folder, wildtype_000000 and nutrientTimeSeries_000025, we now see a side by side comparision of both on the same graph!
The run variant where we started with amino acids initially grows faster as expected, because the cell didn't have to make it's own amino acids, so growth is a bit more efficient.
Then, at 20 minutes, which is about 0.3 hours, we see that the cell starts growing a bit less fast as the slope of the curve decreases a bit, because we removed that free amino acid supply.
Figure 1.
Minimal condition vs amino acid cut mass fraction plot
. Source. From file out/manual/plotOut/svg_plots/massFractionSummary.svg.

The following plots from under out/manual/wildtype_000000/000000/{generation_000000,nutrientTimeSeries_000025}/000000/plotOut/svg_plots have been manually joined side-by-side with:

for f in out/manual/wildtype_000000/000000/generation_000000/000000/plotOut/svg_plots/*; do
  echo $f
  svg_stack.py \
    --direction h \
    out/manual/wildtype_000000/000000/generation_000000/000000/plotOut/svg_plots/$(basename $f) \
    out/manual/nutrientTimeSeries_000025/000000/generation_000000/000000/plotOut/svg_plots/$(basename $f) \
    > tmp/$(basename $f)
done

Figure 2.
Amino acid counts
. Source. `aaCounts.svg`:
default: quantities just increase
amino acid cut: there is an abrupt fall at 20 minutes when we cut off external supply, presumably because it takes some time for the cell to start producing its own

Figure 3.
External exchange fluxes of amino acids
. Source. `aaExchangeFluxes.svg`:
default: no exchanges
amino acid cut: for all graphs except phenylalanine (PHE), either the cell was intaking the AA (negative flux), and that intake goes to 0 when the supply is cut, or the flux is always 0.
For PHE however, the flux is at all times, except shortly after the cut. Why? And why there was no excretion on the default conditions?

Figure 4.
Evaluation time
. Source. `evaluationTime.svg`: this has nothing to do with biology, but it is rather a profile of the program runtime. We can see that the simulation gets slower and slower as time passes, presumably because there are more and more molecules to simulate.

Figure 5.
mRNA count of highly expressed mRNAs
. Source. From file `expression_rna_03_high.svg`. Each of the entries is a gene using the conventional gene naming convention of `xyzW`, e.g. here's the BioCyc for the first entry, `tufA`: biocyc.org/gene?orgid=ECOLI&id=EG11036, which comments
Elongation factor Tu (EF-Tu) is the most abundant protein in E. coli.
and
In E. coli, EF-Tu is encoded by two genes, tufA and tufB
. What they seem to mean is that tufA and tufB are two similar molecules, either of which can make up the EF-Tu of the E. Coli, which is an important part of translation.

Figure 6.
External exchange fluxes
. Source.
`mediaExcange.svg`: this one is similar to `aaExchangeFluxes.svg`, but it also tracks other substances. The color version makes it easier to squeeze more substances in a given space, but you lose the shape of curves a bit. The title seems reversed: red must be excretion, since that's where glucose (GLC) is.
The substances are different between the default and amino acid cut graphs, they seem to be the most exchanged substances. On the amino cut graph, first we see the cell intaking most (except phenylalanine, which is excreted for some reason). When we cut amino acids, the uptake of course stops.

 Read the full article

Key mitochondrial proteins aren't necessarily in mtDNA Updated 2025-07-16

 View more

E.g. in humans the adenine nucleotide translocator is present in chromosome 4, not in mtDNA.

These have almost certainly been transferred to nuclear DNA in the course of evolution.

This isn't completely surprising, since when mitochondria die, their DNA is kind of left in the cell, so it is not hard to imagine how genes end up getting uptaken by the nucleus. This is suggested at Power, Sex, Suicide by Nick Lane (2006) page 196.

A limiting factor appears to be that you can't just past those genes in the nucleus, further mutations are necessary for mitochondrial protein import to work, apparenty some kind of tagging with extra amino acids.

However, you likely don't want to remove all genes from the mitochondria because mitochondria have DNA because they need to be controlled individually.

 Read the full article

Mitochondrial protein import Updated 2025-07-16

 View more

The process that imports proteins encoded in the nuclear DNA and made in the cytosol into the mitochondria.

The term is mentioned e.g. in this article: www.nature.com/articles/nrm2959.

Power, Sex, Suicide by Nick Lane (2006) suggests that proteins are somehow tagged with extra amino acids for this.

 Read the full article

Mycoplasma genitalium Updated 2025-07-16

 View more

www.lgcstandards-atcc.org/products/all/49896.aspx:

£355.00 in 2019
biosafety level: 2

Size: 300 x 600 nm

Reproduction time: www.quora.com/unanswered/How-long-do-Mycoplasma-bacteria-take-to-reproduce-under-optimal-conditions

Has one of the smallest genomes known, and JCVI made a minimized strain with 473 genes: JCVI-syn3.0.

The reason why genitalium has such a small genome is that parasites tend to have smaller DNAs. So it must be highlighted that genitalium can only survive in highly enriched environments, it can't even make its own amino acids, which it normally obtains fromthe host cells! And because it cannot do cellular respiration, it very likely replicates slower than say E. Coli. It's easy to be small in such scenarios!

Power, Sex, Suicide by Nick Lane (2006) section "How to lose the cell wall without dying" page 184 has some related mentions puts it well very:

One group, the Mycoplasma, comprises mostly parasites, many of which live inside other cells. Mycoplasma cells are tiny, with very small genomes. M. genitalium, discovered in 1981, has the smallest known genome of any bacterial cell, encoding fewer than  genes. Despite its simplicity, it ranks among the most common of sexually transmitted diseases, producing symptoms similar to Chlamydia infection. It is so small (less than a third of a micron in diameter, or an order of magnitude smaller than most bacteria) that it must normally be viewed under the electron microscope; and difﬁculties culturing it meant its signiﬁcance was not appreciated until the important advances in gene sequencing in the early 1990s. Like Rickettsia, Mycoplasma have lost virtually all the genes required for making nucleotides, amino acids, and so forth. Unlike Rickettsia, however, Mycoplasma have also lost all the genes for oxygen respiration, or indeed any other form of membrane respiration: they have no cytochromes, and so must rely on fermentation for energy.

Downsides mentioned at youtu.be/PSDd3oHj548?t=293:

too small to see on light microscope
difficult to genetically manipulate. TODO why?
less literature than E. Coli.

Data:

www.ncbi.nlm.nih.gov/bioproject/97 contains genome, genes, proteins.
www.genome.jp/kegg-bin/show_pathway?mge01100 all known pathways. TODO: numerical reaction coefficients? Which enzyimes mediate what? Appears to factor pathways across organisms, which is awesome.

 Read the full article

Parasites tend to have smaller DNAs Updated 2025-07-16

 View more

If you live in the relatively food abundant environment of another cell, then you don't have to be able to digest every single food source in existence, of defend against a wide range of predators.

So because DNA replication is a key limiting factor of bacterial replication time, you just reduce your genome to a minimum.

And likely you also want to be as small as possible to evade the host's immune system.

Power, Sex, Suicide by Nick Lane (2006) section "Gene loss as an evolutionary trajectory" puts it well:

One of the most extreme examples of gene loss is Rickettsia prowazekii, the cause of typhus. [...] Over evolutionary time Rickettsia has lost most of its genes, and now has a mere  protein-coding genes left. [...] Rickettsia is a tiny bacterium, almost as small as a virus, which lives as a parasite inside other cells. It is so well adapted to this lifestyle that it can no longer survive outside its host cells. [...] It was able to lose most of its genes in this way simply because they were not needed: life inside other cells, if you can survive there at all, is a spoonfed existence.

and also section "How to lose the cell wall without dying" page 184 has some related mentions:

While many types of bacteria do lose their cell wall during parts of their life cycle only two groups of prokaryotes have succeeded in losing their cell walls permanently, yet lived to tell the tale. It's interesting to consider the extenuating circumstances that permitted them to do so.
[...]
One group, the Mycoplasma, comprises mostly parasites, many of which live inside other cells. Mycoplasma cells are tiny, with very small genomes. M. genitalium, discovered in 1981, has the smallest known genome of any bacterial cell, encoding fewer than 500 genes. M. genitalium, discovered in 1981, has the smallest known genome of any bacterial cell, encoding fewer than 500 genes. [...] Like Rickettsia, Mycoplasma have lost virtually all the genes required for making nucleotides, amino acids, and so forth.

 Read the full article

Sequence alignment Updated 2025-07-16

 View more

Sequence alignment is trying to match a DNA or amino acid sequence, even though the sequences might not be exactly the same, otherwise it would be a straight up string-search algorithm.

This is fundamental in bioinformatics for two reasons:

when you sequence the DNA of a new species, you can guess what each protein does by comparing it with similar proteins in other species that you have already studied
when doing DNA sequencing, and specially short-read DNA sequencing, you generally need to align the reads to reference genomes to know where you are inside the entire genome, and then be able to spot mutations, notably single-nucleotide polymorphisms

 Read the full article