Gram-negative bacteria by Ciro Santilli 37 Updated 2025-07-16
Notable examples:
Figure 1.
Structure of a Gram-negative bacteria
. Source.
Escherichia coli by Ciro Santilli 37 Updated 2025-07-16
Size: 1-2 micrometers long and about 0.25 micrometer in diameter, so: 2 * 0.5 * 0.5 * 10e-18 and thus 0.5 micrometer square.
Genome:
  • 4k genes
  • 5 Mbps
  • www.ncbi.nlm.nih.gov/genome/167
  • wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna.gz
  • wget -O NC_000913.3.fasta 'https://www.ncbi.nlm.nih.gov/search/api/sequence/NC_000913.3/?report=fasta'
Omics modeling: www.ncbi.nlm.nih.gov/pmc/articles/PMC5611438/ Tools for Genomic and Transcriptomic Analysis of Microbes at Single-Cell Level Zixi Chen, Lei Chen, Weiwen Zhang.
20 minutes in optimal conditions, with a crazy multiple start sites mechanism: E. Coli starts DNA replication before the previous one finished.
Otherwise, naively, would take 60-90 minutes just to replicate and segregate the full DNA otherwise. So it starts copying multiple times.
The conventional starting point is not at the E. Coli K-12 MG1655 origin of replication.
biocyc.org/ECOLI/NEW-IMAGE?type=EXTRAGENIC-SITE&object=G0-10506 explains:
This site is the origin of replication of the E. coli chromosome. It contains the binding sites for DnaA, which is critical for initiation of replication. Replication proceeds bidirectionally. For historical reasons, the numbering of E. coli's circular chromosome does not start at the origin of replication, but at the origin of transfer during conjugation.
If it is a bit hard to understand what they mean by "origin of transfer" though, as that term is usually associated with the origin of transfer of bacterial conjugation.
The project is written in Python, hurray!
But according to te README, it seems to be the use a code drop model with on-request access to master. Ciro Santilli asked at rationale on GitHub discussion, and they confirmed as expected that it is to:
  • to prevent their publication ideas from being stolen. Who would steal publication ideas with public proof in an issue tracker without crediting original authors? Academia is broken. Academia should be the most open form of knowledge sharing. But instead we get this silly competition for publication points.
  • to prevent noise from non-collaborators. But they only get like 2 issues as year on such a meganiche subject... Did you know that you can ignore people, and even block them if they are particularly annoying? Much more likely is that no one will every hear about your project and that it will die with its last graduate student slave.
The project is a followup to the earlier M. genitalium whole cell model by Covert lab which modelled Mycoplasma genitalium. E. Coli has 8x more genes (500 vs 4k), but it the undisputed bacterial model organism and as such has been studied much more thoroughly. It also reproduces faster than Mycoplasma (20 minutes vs a few hours), which is a huge advantages for validation/exploratory experiments.
The project has a partial dependency on the proprietary optimization software CPLEX which is freeware, for students, not sure what it is used for exactly, from the comment in the requirements.txt the dependency is only partial.
This project makes Ciro Santilli think of the E. Coli as an optimization problem. Given such external nutrient/temperature condition, which DNA sequence makes the cell grow the fastest? Balancing metabolites feels like designing a Factorio speedrun.
There is one major thing missing thing in the current model: promoters/transcription factor interactions are not modelled due to lack/low quality of experimental data: github.com/CovertLab/WholeCellEcoliRelease/issues/21. They just have a magic direct "transcription factor to gene" relationship, encoded at reconstruction/ecoli/flat/foldChanges.tsv in terms of type "if this is present, such protein is expressed 10x more". Transcription units are not implemented at all it appears.
Everything in this section refers to version 7e4cc9e57de76752df0f4e32eca95fb653ea64e4, the code drop from November 2020, and was tested on Ubuntu 21.04 with a docker install of docker.pkg.github.com/covertlab/wholecellecolirelease/wcm-full with image id 502c3e604265, unless otherwise noted.
At 7e4cc9e57de76752df0f4e32eca95fb653ea64e4 you basically need to use the Docker image on Ubuntu 21.04 due to pip breaking changes... (not their fault). Perhaps pyenv would solve things, but who has the patience for that?!?!
The Docker setup from README does just work. The image download is a bit tedius, as it requires you to create a GitHub API key as described in the README, but there must be reasons for that.
Once the image is downloaded, you really want to run is from the root of the source tree:
sudo docker run --name=wcm -it -v "$(pwd):/wcEcoli" docker.pkg.github.com/covertlab/wholecellecolirelease/wcm-full
This mounts the host source under /wcEcoli, so you can easily edit and view output images from your host. Once inside Docker we can compile, run the simulation, and analyze results with:
make clean compile &&
python runscripts/manual/runFitter.py &&
python runscripts/manual/runSim.py &&
python runscripts/manual/analysisVariant.py &&
python runscripts/manual/analysisCohort.py &&
python runscripts/manual/analysisMultigen.py &&
python runscripts/manual/analysisSingle.py
The meaning of each of the analysis commands is described at Section "Output overview".
As a Docker refresher, after you stop the container, e.g. by restarting your computer or running sudo docker stop wcm, you can get back into it with:
sudo docker start wcm
sudo docker run -it wcm bash
runscripts/manual/runFitter.py takes about 15 minutes, and it generates files such as reconstruction/ecoli/dataclasses/process/two_component_system.py (related) which is required to run the simulation, it is basically a part of the build.
runSim.py does the main simulation, progress output contains lines of type:
Time (s)  Dry mass     Dry mass      Protein          RNA    Small mol     Expected
              (fg)  fold change  fold change  fold change  fold change  fold change
========  ========  ===========  ===========  ===========  ===========  ===========
    0.00    403.09        1.000        1.000        1.000        1.000        1.000
    0.20    403.18        1.000        1.000        1.000        1.000        1.000
and then it ended on the Lenovo ThinkPad P51 (2017) at:
 2569.18    783.09        1.943        1.910        2.005        1.950        1.963

Simulation finished:
 - Length: 0:42:49
 - Runtime: 0:09:13
when the cell had almost doubled, and presumably divided in 42 minutes of simulated time, which could make sense compared to the 20 under optimal conditions.
Run output is placed under out/:
Some of the output data is stored as .cpickle files. To observe those files, you need the original Python classes, and therefore you have to be inside Docker, from the host it won't work.
We can list all the plots that have been produced under out/ with
find -name '*.png'
Plots are also available in SVG and PDF formats, e.g.:
  • PNG: ./out/manual/plotOut/low_res_plots/massFractionSummary.png
  • SVG: ./out/manual/plotOut/svg_plots/massFractionSummary.svg The SVGs write text as polygons, see also: SVG fonts.
  • PDF: ./out/manual/plotOut/massFractionSummary.pdf
The output directory has a hierarchical structure of type:
./out/manual/wildtype_000000/000000/generation_000000/000000/
where:
  • wildtype_000000: variant conditions. wildtype is a human readable label, and 000000 is an index amongst the possible wildtype conditions. For example, we can have different simulations with different nutrients, or different DNA sequences. An example of this is shown at run variants.
  • 000000: initial random seed for the initial cell, likely fed to NumPy's np.random.seed
  • genereation_000000: this will increase with generations if we simulate multiple cells, which is supported by the model
  • 000000: this will presumably contain the cell index within a generation
We also understand that some of the top level directories contain summaries over all cells, e.g. the massFractionSummary.pdf plot exists at several levels of the hierarchy:
./out/manual/plotOut/massFractionSummary.pdf
./out/manual/wildtype_000000/plotOut/massFractionSummary.pdf
./out/manual/wildtype_000000/000000/plotOut/massFractionSummary.pdf
./out/manual/wildtype_000000/000000/generation_000000/000000/plotOut/massFractionSummary.pdf
Each of thoes four levels of plotOut is generated by a different one of the analysis scripts:
  • ./out/manual/plotOut: generated by python runscripts/manual/analysisVariant.py. Contains comparisons of different variant conditions. We confirm this by looking at the results of run variants.
  • ./out/manual/wildtype_000000/plotOut: generated by python runscripts/manual/analysisCohort.py --variant_index 0. TODO not sure how to differentiate between two different labels e.g. wildtype_000000 and somethingElse_000000. If -v is not given, a it just picks the first one alphabetically. TODO not sure how to automatically generate all of those plots without inspecting the directories.
  • ./out/manual/wildtype_000000/000000/plotOut: generated by python runscripts/manual/analysisMultigen.py --variant_index 0 --seed 0
  • ./out/manual/wildtype_000000/000000/generation_000000/000000/plotOut: generated by python runscripts/manual/analysisSingle.py --variant_index 0 --seed 0 --generation 0 --daughter 0. Contains information about a single specific cell.
It would be boring if we could only simulate the same condition all the time, so let's have a look at the different boundary conditions that we can apply to the cell!
We are able to alter things like the composition of the external medium, and the genome of the bacteria, which will make the bacteria behave differently.
The variant selection is a bit cumbersome as we have to use indexes instead of names, but one you know what you are doing, it is fine.
Of course, genetic modification is limited only to experimentally known protein interactions due to the intractability of computational protein folding and computational chemistry in general, solving those would bsai.
The default run variant, if you don't pass any options, just has the minimal growth conditions set. What this means can be seen at condition.
Notably, this implies a growth medium that includes glucose and salt. It also includes oxygen, which is not strictly required, but greatly benefits cell growth, and is of course easier to have than not have as it is part of the atmosphere!
But the medium does not include amino acids, which the bacteria will have to produce by itself.

Pinned article: Introduction to the OurBigBook Project

Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.
Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!
We have two killer features:
  1. topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculus
    Articles of different users are sorted by upvote within each article page. This feature is a bit like:
    • a Wikipedia where each user can have their own version of each article
    • a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
    This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.
    Figure 1.
    Screenshot of the "Derivative" topic page
    . View it live at: ourbigbook.com/go/topic/derivative
  2. local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:
    This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
    Figure 2.
    You can publish local OurBigBook lightweight markup files to either https://OurBigBook.com or as a static website
    .
    Figure 3.
    Visual Studio Code extension installation
    .
    Figure 4.
    Visual Studio Code extension tree navigation
    .
    Figure 5.
    Web editor
    . You can also edit articles on the Web editor without installing anything locally.
    Video 3.
    Edit locally and publish demo
    . Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension.
    Video 4.
    OurBigBook Visual Studio Code extension editing and navigation demo
    . Source.
  3. https://raw.githubusercontent.com/ourbigbook/ourbigbook-media/master/feature/x/hilbert-space-arrow.png
  4. Infinitely deep tables of contents:
    Figure 6.
    Dynamic article tree with infinitely deep table of contents
    .
    Descendant pages can also show up as toplevel e.g.: ourbigbook.com/cirosantilli/chordate-subclade
All our software is open source and hosted at: github.com/ourbigbook/ourbigbook
Further documentation can be found at: docs.ourbigbook.com
Feel free to reach our to us for any help or suggestions: docs.ourbigbook.com/#contact