E. Coli Whole Cell Model by Covert Lab Source code overview by
Ciro Santilli 37 Updated 2025-07-01 +Created 1970-01-01
Let's try to understand some interesting looking, with a special focus on our understanding of the tiny E. Coli K-12 MG1655 operon thrLABC part of the metabolism, which we have well understood at Section "E. Coli K-12 MG1655 operon thrLABC".
reconstruction/ecoli/flat/compartments.tsv
contains cellular compartment information:"abbrev" "id" "n" "CCO-BAC-NUCLEOID" "j" "CCO-CELL-PROJECTION" "w" "CCO-CW-BAC-NEG" "c" "CCO-CYTOSOL" "e" "CCO-EXTRACELLULAR" "m" "CCO-MEMBRANE" "o" "CCO-OUTER-MEM" "p" "CCO-PERI-BAC" "l" "CCO-PILUS" "i" "CCO-PM-BAC-NEG"
CCO
: "Celular COmpartment"BAC-NUCLEOID
: nucleoidCELL-PROJECTION
: cell projectionCW-BAC-NEG
: TODO confirm: cell wall (of a Gram-negative bacteria)CYTOSOL
: cytosolEXTRACELLULAR
: outside the cellMEMBRANE
: cell membraneOUTER-MEM
: bacterial outer membranePERI-BAC
: periplasmPILUS
: pilusPM-BAC-NEG
: TODO: plasma membrane, but that is the same as cell membrane no?
reconstruction/ecoli/flat/promoters.tsv
contains promoter information. Simple file, sample lines:corresponds to E. Coli K-12 MG1655 promoter thrLp, which starts as position 148."position" "direction" "id" "name" 148 "+" "PM00249" "thrLp"
reconstruction/ecoli/flat/proteins.tsv
contains protein information. Sample line corresponding to e. Coli K-12 MG1655 gene thrA:so we understand that:"aaCount" "name" "seq" "comments" "codingRnaSeq" "mw" "location" "rnaId" "id" "geneId" [91, 46, 38, 44, 12, 53, 30, 63, 14, 46, 89, 34, 23, 30, 29, 51, 34, 4, 20, 0, 69] "ThrA" "MRVL..." "Location information from Ecocyc dump." "AUGCGAGUGUUG..." [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 89103.51099999998, 0.0, 0.0, 0.0, 0.0] ["c"] "EG10998_RNA" "ASPKINIHOMOSERDEHYDROGI-MONOMER" "EG10998"
aaCount
: amino acid count, how many of each of the 20 proteinogenic amino acid are thereseq
: full sequence, using the single letter abbreviation of the proteinogenic amino acidsmw
; molecular weight? The 11 components appear to be given atreconstruction/ecoli/flat/scripts/unifyBulkFiles.py
:so they simply classify the weight? Presumably this exists for complexes that have multiple classes?molecular_weight_keys = [ '23srRNA', '16srRNA', '5srRNA', 'tRNA', 'mRNA', 'miscRNA', 'protein', 'metabolite', 'water', 'DNA', 'RNA' # nonspecific RNA ]
23srRNA
,16srRNA
,5srRNA
are the three structural RNAs present in the ribosome: 23S ribosomal RNA, 16S ribosomal RNA, 5S ribosomal RNA, all others are obvious:- tRNA
- mRNA
- protein. This is the seventh class, and this enzyme only contains mass in this class as expected.
- metabolite
- water
- DNA
- RNA: TODO
rna
vsmiscRNA
location
: cell compartment where the protein is present,c
defined atreconstruction/ecoli/flat/compartments.tsv
as cytoplasm, as expected for something that will make an amino acid
reconstruction/ecoli/flat/rnas.tsv
: TODO vstranscriptionUnits.tsv
. Sample lines:"halfLife" "name" "seq" "type" "modifiedForms" "monomerId" "comments" "mw" "location" "ntCount" "id" "geneId" "microarray expression" 174.0 "ThrA [RNA]" "AUGCGAGUGUUG..." "mRNA" [] "ASPKINIHOMOSERDEHYDROGI-MONOMER" "" [0.0, 0.0, 0.0, 0.0, 790935.00399999996, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ["c"] [553, 615, 692, 603] "EG10998_RNA" "EG10998" 0.0005264904
halfLife
: half-lifemw
: molecular weight, same as inreconstruction/ecoli/flat/proteins.tsv
. This molecule only have weight in themRNA
class, as expected, as it just codes for a proteinlocation
: same as inreconstruction/ecoli/flat/proteins.tsv
ntCount
: nucleotide count for each of the ATGCmicroarray expression
: presumably refers to DNA microarray for gene expression profiling, but what measure exactly?
reconstruction/ecoli/flat/sequence.fasta
: FASTA DNA sequence, first two lines:>E. coli K-12 MG1655 U00096.2 (1 to 4639675 = 4639675 bp) AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTG
reconstruction/ecoli/flat/transcriptionUnits.tsv
: transcription units. We can observe for example the two different transcription units of the E. Coli K-12 MG1655 operon thrLABC in the lines:"expression_rate" "direction" "right" "terminator_id" "name" "promoter_id" "degradation_rate" "id" "gene_id" "left" 0.0 "f" 310 ["TERM0-1059"] "thrL" "PM00249" 0.198905992329492 "TU0-42486" ["EG11277"] 148 657.057317358791 "f" 5022 ["TERM_WC-2174"] "thrLABC" "PM00249" 0.231049060186648 "TU00178" ["EG10998", "EG10999", "EG11000", "EG11277"] 148
promoter_id
: matches promoter id inreconstruction/ecoli/flat/promoters.tsv
gene_id
: matches id inreconstruction/ecoli/flat/genes.tsv
id
: matches exactly those used in BioCyc, which is quite nice, might be more or less standardized:
reconstruction/ecoli/flat/genes.tsv
"length" "name" "seq" "rnaId" "coordinate" "direction" "symbol" "type" "id" "monomerId" 66 "thr operon leader peptide" "ATGAAACGCATT..." "EG11277_RNA" 189 "+" "thrL" "mRNA" "EG11277" "EG11277-MONOMER" 2463 "ThrA" "ATGCGAGTGTTG" "EG10998_RNA" 336 "+" "thrA" "mRNA" "EG10998" "ASPKINIHOMOSERDEHYDROGI-MONOMER"
reconstruction/ecoli/flat/metabolites.tsv
contains metabolite information. Sample lines:In the case of the enzyme thrA, one of the two reactions it catalyzes is "L-aspartate 4-semialdehyde" into "Homoserine"."id" "mw7.2" "location" "HOMO-SER" 119.12 ["n", "j", "w", "c", "e", "m", "o", "p", "l", "i"] "L-ASPARTATE-SEMIALDEHYDE" 117.104 ["n", "j", "w", "c", "e", "m", "o", "p", "l", "i"]
Starting from the enzyme page: biocyc.org/gene?orgid=ECOLI&id=EG10998 we reach the reaction page: biocyc.org/ECOLI/NEW-IMAGE?type=REACTION&object=HOMOSERDEHYDROG-RXN which has reaction IDHOMOSERDEHYDROG-RXN
, and that page which clarifies the IDs:so these are the compounds that we care about.- biocyc.org/compound?orgid=ECOLI&id=L-ASPARTATE-SEMIALDEHYDE: "L-aspartate 4-semialdehyde" has ID
L-ASPARTATE-SEMIALDEHYDE
- biocyc.org/compound?orgid=ECOLI&id=HOMO-SER: "Homoserine" has ID
HOMO-SER
- biocyc.org/compound?orgid=ECOLI&id=L-ASPARTATE-SEMIALDEHYDE: "L-aspartate 4-semialdehyde" has ID
reconstruction/ecoli/flat/reactions.tsv
contains chemical reaction information. Sample lines:"reaction id" "stoichiometry" "is reversible" "catalyzed by" "HOMOSERDEHYDROG-RXN-HOMO-SER/NAD//L-ASPARTATE-SEMIALDEHYDE/NADH/PROTON.51." {"NADH[c]": -1, "PROTON[c]": -1, "HOMO-SER[c]": 1, "L-ASPARTATE-SEMIALDEHYDE[c]": -1, "NAD[c]": 1} false ["ASPKINIIHOMOSERDEHYDROGII-CPLX", "ASPKINIHOMOSERDEHYDROGI-CPLX"] "HOMOSERDEHYDROG-RXN-HOMO-SER/NADP//L-ASPARTATE-SEMIALDEHYDE/NADPH/PROTON.53." {"NADPH[c]": -1, "NADP[c]": 1, "PROTON[c]": -1, "L-ASPARTATE-SEMIALDEHYDE[c]": -1, "HOMO-SER[c]": 1 false ["ASPKINIIHOMOSERDEHYDROGII-CPLX", "ASPKINIHOMOSERDEHYDROGI-CPLX"]
catalized by
: here we seeASPKINIHOMOSERDEHYDROGI-CPLX
, which we can guess is a protein complex made out ofASPKINIHOMOSERDEHYDROGI-MONOMER
, which is the ID for thethrA
we care about! This is confirmed incomplexationReactions.tsv
.
reconstruction/ecoli/flat/complexationReactions.tsv
contains information about chemical reactions that produce protein complexes:The"process" "stoichiometry" "id" "dir" "complexation" [ { "molecule": "ASPKINIHOMOSERDEHYDROGI-CPLX", "coeff": 1, "type": "proteincomplex", "location": "c", "form": "mature" }, { "molecule": "ASPKINIHOMOSERDEHYDROGI-MONOMER", "coeff": -4, "type": "proteinmonomer", "location": "c", "form": "mature" } ] "ASPKINIHOMOSERDEHYDROGI-CPLX_RXN" 1
coeff
is how many monomers need to get together for form the final complex. This can be seen from the Summary section of ecocyc.org/gene?orgid=ECOLI&id=ASPKINIHOMOSERDEHYDROGI-MONOMER:Fantastic literature summary! Can't find that in database form there however.Aspartate kinase I / homoserine dehydrogenase I comprises a dimer of ThrA dimers. Although the dimeric form is catalytically active, the binding equilibrium dramatically favors the tetrameric form. The aspartate kinase and homoserine dehydrogenase activities of each ThrA monomer are catalyzed by independent domains connected by a linker region.
reconstruction/ecoli/flat/proteinComplexes.tsv
contains protein complex information:"name" "comments" "mw" "location" "reactionId" "id" "aspartate kinase / homoserine dehydrogenase" "" [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 356414.04399999994, 0.0, 0.0, 0.0, 0.0] ["c"] "ASPKINIHOMOSERDEHYDROGI-CPLX_RXN" "ASPKINIHOMOSERDEHYDROGI-CPLX"
reconstruction/ecoli/flat/protein_half_lives.tsv
contains the half-life of proteins. Very few proteins are listed however for some reason.reconstruction/ecoli/flat/tfIds.csv
: transcription factors information:"TF" "geneId" "oneComponentId" "twoComponentId" "nonMetaboliteBindingId" "activeId" "notes" "arcA" "EG10061" "PHOSPHO-ARCA" "PHOSPHO-ARCA" "fnr" "EG10325" "FNR-4FE-4S-CPLX" "FNR-4FE-4S-CPLX" "dksA" "EG10230"
Cool data embedded in the Bitcoin blockchain Early AtomSea & EMBII uploads by
Ciro Santilli 37 Updated 2025-07-01 +Created 1970-01-01
These are of course likely all made by AtomSea & EMBII themselves while developing/testing their upload system.
They are also artsy peoeple themselves, and as pointed at twitter.com/AllenVandever/status/1563964396656812034 what they were doing was basicaly non-fungible token art, which became much much more popular a few years later around 2021.
The first upload that we could find at github.com/cirosantilli/bitcoin-inscription-indexer/tree/3f53e152ec9bb0d070dbcb8f9249d92f89effa70#atomsea-index was tx 44e80475dc363de2c7ee17b286f8cd49eb146165a79968a62c1c2c4cf80772c9 on block 272573 (2013-12-01) but it does not show on Bitfossil: bitfossil.org/44e80475dc363de2c7ee17b286f8cd49eb146165a79968a62c1c2c4cf80772c9/. This is was due to an upload bug explained by the following entry. By looking at the ASCII data at github.com/cirosantilli/bitcoin-inscription-indexer/blob/master/data/out/0272.txt#L449 that this is meant to contain the same content as the following message: a quote from the Bhagavad Gita, so this is definitely a bugged version of the following one.
The next one is tx c9d1363ea517cd463950f83168ce8242ef917d99cd6518995bd1af927d335828 block 272577 (2013-12-02). It actually shows on bifossil and it reads:followed by:The bug message is definitely a reference to the previous non-visible bugged upload bitfossil.org/4b72a223007eab8a951d43edc171befeabc7b5dca4213770c88e09ba5b936e17/, TODO understand exactly how they fucked up. This illustrates the beauty of the blockchain very well: unlike with version control, you don't just see selected snapshots: you see actual debug logs!!!
He who regards
With an eye that is equal
Friends and comrades,
The foe and the kinsman,
The vile, the wicked,
The men who judge him,
And those who belong
To neither faction:
He is the greatest.
WeAreStarStuff.jpg
The filename is of course a reference to the quote/idea: We Are Made of Star-Stuff that was much popularized by Carl Sagan.
bitfossil.org/fac0b9a4f90414710b806fd286e020aea2404498946845ef3783f305dd4cd3a7 (2024-01-13) contains a cropped version with only AtomSea persent.
HugPuddle.jpg
The fourth AtomSea & EMBII upload, and the second image. Message:
HugPuddle Testing Apertus Disk Drive
And then finally we meet Chiharu, EMBII's partner, with her hair painted blond (she's Japanese): ILoveYouMore.jpg.
Then there are two undecoded ones TODO investigate:
Then Nelson-Mandela.jpg.
Then there's an approximation of pi as ASCII decimal fraction on tx 70fd289901bae0409f27237506c330588d917716944c6359a8711b0ad6b4ce76 from block 273522 (2013-12-07):
3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489549303819644288109756659334461284756482337867831652712019091456485669234603486104543266482133936072602491412737245870066063155881748815209209628292540917153643678925903600113305305488204665213841469519415116094330572703657595919530921861173819326117931051185480744623799627495673518857527248912279381830119491298336733624406566430860213949463952247371907021798609437027705392171762931767523846748184676694051320005681271452635608277857713427577896091736371787214684409012249534301465495853710507922796892589235420199561121290219608640344181598136297747713099605187072113499999983729780499510597317328160963185950244594553469083026425223082533446850352619311881710100031378387528865875332083814206171776691473035982534904287554687311595628638823537875937519577818577805321712268066130019278766111959092164201989
tx b8b9f50a354166c46b69ecd47a0fbd20ee78c3471d2557bf275aff1b4cf4752d (2013-12-07) on bitfossil.org) contains Where the Sidewalk Ends by Shel Silverstein:
There is a place where the sidewalk ends
And before the street begins,
And there the grass grows soft and white,
And there the sun burns crimson bright,
And there the moon-bird rests from his flight
To cool in the peppermint wind.
tx 56768b30dec33bd284223d85c23087975e2360b3391d20d505aa59a5675e5379 (2013-12-13, on bitfossil.org):
Dear Aliens,Hey.Sincerely,
EMBII & AtomSea
tx 415c702759893c63b3a57a7d196b014e51b2a33d2396c74b8e71acfaff6b9360 (2013-12-14) contains a poem by 13th century Persian poet Rumi (TODO find bitfossil.org toplevel), starting with:Reproduced e.g. at: www.abuddhistlibrary.com/Buddhism/H%20-%20World%20Religions%20and%20Poetry/World%20Religions/Islam/Teachers/Rumi/My%20dear%20friend/Rumi%20-%20my%20dear%20friend.htm
My dear friend
never lose hope
when the Beloved
sends you away.
bitfossil.org/73ca50321147bac9010bec43d63f7f76857fe9ede240cc89710e28723fdb242f/ (2013-12-14) has message:and links to 3 .txt files
MULTIFILE SUPPORT TEST
1.txt
, 2.txt
, 3.txt
containing single characters 1
, 2
, and 3
.CompressedLogo.png
. Source. 2013-12-20. Message:Possibly www.linkedin.com/in/colby-nelson-59b538207/.
Colby Nelson and myself burnt the midnight oils designing the APERTUS imagery last night....Thanks Colby for all your help.
Contains an Apertus logo which is used on bitfossil.org/ itself, presumably they were designing that logo.
Similar to a college, but led by religious denomination leaders rather than fellows.
Classification of 5-transitive groups by
Ciro Santilli 37 Updated 2025-07-01 +Created 1970-01-01
Apparently only Mathieu group and Mathieu group .
www.maths.qmul.ac.uk/~pjc/pps/pps9.pdf mentions:Hmm, is that 54, or more likely 5 and 4?
The automorphism group of the extended Golay code is the 54-transitive Mathieu group . This is one of only two finite 5-transitive groups other than symmetric and alternating groups
scite.ai/reports/4-homogeneous-groups-EAKY21 quotes link.springer.com/article/10.1007%2FBF01111290 which suggests that is is also another one of the Mathieu groups, math.stackexchange.com/questions/698327/classification-of-triply-transitive-finite-groups#comment7650505_3721840 and en.wikipedia.org/wiki/Mathieu_group_M12 mentions .
Minimal example: github.com/cirosantilli/x86-bare-metal-examples/blob/5c672f73884a487414b3e21bd9e579c67cd77621/paging.S
Like everything else in programming, the only way to really understand this is to play with minimal examples.
x86 Paging Tutorial Example: simplified single-level paging scheme by
Ciro Santilli 37 Updated 2025-07-01 +Created 1970-01-01
x86 Paging Tutorial PAE and PSE page table schemes by
Ciro Santilli 37 Updated 2025-07-01 +Created 1970-01-01
If either PAE and PSE are active, different paging level schemes are used:
- no PAE and no PSE:
10 | 10 | 12
- no PAE and PSE:
10 | 22
.22 is the offset within the 4Mb page, since 22 bits address 4Mb. - PAE and no PSE:
2 | 9 | 9 | 12
The design reason why 9 is used twice instead of 10 is that now entries cannot fit anymore into 32 bits, which were all filled up by 20 address bits and 12 meaningful or reserved flag bits.The reason is that 20 bits are not enough anymore to represent the address of page tables: 24 bits are now needed because of the 4 extra wires added to the processor.Therefore, the designers decided to increase entry size to 64 bits, and to make them fit into a single page table it is necessary reduce the number of entries to 2^9 instead of 2^10. - PAE and PSE:
2 | 9 | 21
x86 Paging Tutorial Second Level Address Translation by
Ciro Santilli 37 Updated 2025-07-01 +Created 1970-01-01
Two level address translation to make OS emulation more efficient.
WhatsApp profile information is public by default by
Ciro Santilli 37 Updated 2025-07-01 +Created 1970-01-01
This means that all secret services in the world have alrady scraped this information for everyone that uses WhatsApp!!!
They just have to go incrementally through the list of all phone numbers... 001 0000 0000, 001 0000 0001, 001 0000 0002, etc. and then you can deduce who has which phone number.
OMG... it is analogous to the Facebook profile face dump.
Sure, it is forbidden in theory: faq.whatsapp.com/general/security-and-privacy/about-harvesting-personal-information/.
The course outline is given in a "handbook", a one or more PDF files that contain what people will learn and other practicalities. There is a full list of handbooks at: www.ox.ac.uk/students/academic/guidance/undergraduate/handbooks, but many of them are closed. The system is so closed that even the fucking course list is closed, e.g. all links at: www2.physics.ox.ac.uk/students/undergraduates are closed. Insane.
Autonomous agents research group of the University of Edinburgh by
Ciro Santilli 37 Updated 2025-07-01 +Created 1970-01-01
University of California Los Angeles by
Ciro Santilli 37 Updated 2025-07-01 +Created 1970-01-01
There are unlisted articles, also show them or only show them.