reconstruction/ecoli/flat/condition/nutrient/minimal.tsv
contains the nutrients in a minimal environment in which the cell survives:If we compare that to"molecule id" "lower bound (units.mmol / units.g / units.h)" "upper bound (units.mmol / units.g / units.h)" "ADP[c]" 3.15 3.15 "PI[c]" 3.15 3.15 "PROTON[c]" 3.15 3.15 "GLC[p]" NaN 20 "OXYGEN-MOLECULE[p]" NaN NaN "AMMONIUM[c]" NaN NaN "PI[p]" NaN NaN "K+[p]" NaN NaN "SULFATE[p]" NaN NaN "FE+2[p]" NaN NaN "CA+2[p]" NaN NaN "CL-[p]" NaN NaN "CO+2[p]" NaN NaN "MG+2[p]" NaN NaN "MN+2[p]" NaN NaN "NI+2[p]" NaN NaN "ZN+2[p]" NaN NaN "WATER[p]" NaN NaN "CARBON-DIOXIDE[p]" NaN NaN "CPD0-1958[p]" NaN NaN "L-SELENOCYSTEINE[c]" NaN NaN "GLC-D-LACTONE[c]" NaN NaN "CYTOSINE[c]" NaN NaN
reconstruction/ecoli/flat/condition/nutrient/minimal_plus_amino_acids.tsv
, we see that it adds the 20 amino acids on top of the minimal condition:so we guess that"L-ALPHA-ALANINE[p]" NaN NaN "ARG[p]" NaN NaN "ASN[p]" NaN NaN "L-ASPARTATE[p]" NaN NaN "CYS[p]" NaN NaN "GLT[p]" NaN NaN "GLN[p]" NaN NaN "GLY[p]" NaN NaN "HIS[p]" NaN NaN "ILE[p]" NaN NaN "LEU[p]" NaN NaN "LYS[p]" NaN NaN "MET[p]" NaN NaN "PHE[p]" NaN NaN "PRO[p]" NaN NaN "SER[p]" NaN NaN "THR[p]" NaN NaN "TRP[p]" NaN NaN "TYR[p]" NaN NaN "L-SELENOCYSTEINE[c]" NaN NaN "VAL[p]" NaN NaN
NaN
in theupper mound
likely means infinite.We can try to understand the less obvious ones:ADP
: TODOPI
: TODOPROTON[c]
: presumably a measure of pHGLC[p]
: glucose, this can be seen by comparingminimal.tsv
withminimal_no_glucose.tsv
AMMONIUM
: ammonium. This appears to be the primary source of nitrogen atoms for producing amino acids.CYTOSINE[c]
: hmmm, why is external cytosine needed? Weird.
- reconstruction/ecoli/flat/reconstruction/ecoli/flat/condition/timeseries/000000_basal.tsv
reconstruction/ecoli/flat/reconstruction/ecoli/flat/condition/timeseries/` contains sequences of conditions for each time. For example: *
contains:
"time (units.s)" "nutrients" 0 "minimal"
which means just using
reconstruction/ecoli/flat/condition/nutrient/minimal.tsvuntil infinity. That is the default one used by
runSim.py, as can be seen from
./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Environment/attributes/nutrientTimeSeriesLabelwhich contains just
000000_basal. *
reconstruction/ecoli/flat/reconstruction/ecoli/flat/condition/timeseries/000001_cut_glucose.tsv
is more interesting and contains:so we see that this will shift the conditions half-way to a condition that will eventually kill the bacteria because it will run out of glucose and thus energy!"time (units.s)" "nutrients" 0 "minimal" 1200 "minimal_no_glucose"
Timeseries can be selected with--variant nutrientTimeSeries X Y
, see also: run variants.We can use that variant with:VARIANT="condition" FIRST_VARIANT_INDEX=1 LAST_VARIANT_INDEX=1 python runscripts/manual/runSim.py
reconstruction/ecoli/flat/condition/condition_defs.tsv
contains lines of form:"condition" "nutrients" "genotype perturbations" "doubling time (units.min)" "active TFs" "basal" "minimal" {} 44.0 [] "no_oxygen" "minimal_minus_oxygen" {} 100.0 [] "with_aa" "minimal_plus_amino_acids" {} 25.0 ["CPLX-125", "MONOMER0-162", "CPLX0-7671", "CPLX0-228", "MONOMER0-155"]
condition
refers to entries inreconstruction/ecoli/flat/condition/condition_defs.tsv
nutrients
refers to entries underreconstruction/ecoli/flat/condition/nutrient/
, e.g.reconstruction/ecoli/flat/condition/nutrient/minimal.tsv
orreconstruction/ecoli/flat/condition/nutrient/minimal_plus_amino_acids.tsv
genotype perturbations
: there aren't any in the file, but this suggests that genotype modifications can also be incorporated heredoubling time
: TODO experimental data? Because this should be a simulation output, right? Or do they cheat and fix doubling by time?active TFs
: this suggests that they are cheating transcription factors here, as those would ideally be functions of other more basic inputs
Run output is placed under
out/
:Some of the output data is stored as
.cpickle
files. To observe those files, you need the original Python classes, and therefore you have to be inside Docker, from the host it won't work.We can list all the plots that have been produced under Plots are also available in SVG and PDF formats, e.g.:
out/
withfind -name '*.png'
The output directory has a hierarchical structure of type:where:
./out/manual/wildtype_000000/000000/generation_000000/000000/
wildtype_000000
: variant conditions.wildtype
is a human readable label, and000000
is an index amongst the possiblewildtype
conditions. For example, we can have different simulations with different nutrients, or different DNA sequences. An example of this is shown at run variants.000000
: initial random seed for the initial cell, likely fed to NumPy'snp.random.seed
genereation_000000
: this will increase with generations if we simulate multiple cells, which is supported by the model000000
: this will presumably contain the cell index within a generation
We also understand that some of the top level directories contain summaries over all cells, e.g. the
massFractionSummary.pdf
plot exists at several levels of the hierarchy:./out/manual/plotOut/massFractionSummary.pdf
./out/manual/wildtype_000000/plotOut/massFractionSummary.pdf
./out/manual/wildtype_000000/000000/plotOut/massFractionSummary.pdf
./out/manual/wildtype_000000/000000/generation_000000/000000/plotOut/massFractionSummary.pdf
Each of thoes four levels of
plotOut
is generated by a different one of the analysis scripts:./out/manual/plotOut
: generated bypython runscripts/manual/analysisVariant.py
. Contains comparisons of different variant conditions. We confirm this by looking at the results of run variants../out/manual/wildtype_000000/plotOut
: generated bypython runscripts/manual/analysisCohort.py --variant_index 0
. TODO not sure how to differentiate between two different labels e.g.wildtype_000000
andsomethingElse_000000
. If-v
is not given, a it just picks the first one alphabetically. TODO not sure how to automatically generate all of those plots without inspecting the directories../out/manual/wildtype_000000/000000/plotOut
: generated bypython runscripts/manual/analysisMultigen.py --variant_index 0 --seed 0
./out/manual/wildtype_000000/000000/generation_000000/000000/plotOut
: generated bypython runscripts/manual/analysisSingle.py --variant_index 0 --seed 0 --generation 0 --daughter 0
. Contains information about a single specific cell.