Ciro Santilli @cirosantilli 37

 Incoming links: `grep`

E. Coli Whole Cell Model by Covert Lab / Mass fraction summary plot analysis Created 2024-12-04 Updated 2025-07-16

Let's look into a sample plot, out/manual/plotOut/svg_plots/massFractionSummary.svg, and try to understand as much as we can about what it means and how it was generated.

This plot contains how much of each type of mass is present in all cells. Since we simulated just one cell, it will be the same as the results for that cell.

We can see that all of them grow more or less linearly, perhaps as the start of an exponential. We can see that all of them grow more or less linearly, perhaps as the start of an exponential. We can see that all of them grow more or less linearly, perhaps as the start of an exponential.

total dry mass (mass excluding water)
protein mass
rRNA mass
mRNA mass
DNA mass. The last label is not very visible on the plots, but we can deduce it from the source code.

By grepping the title "Cell mass fractions" in the source code, we see the files:

models/ecoli/analysis/cohort/massFractionSummary.py
models/ecoli/analysis/multigen/massFractionSummary.py
models/ecoli/analysis/variant/massFractionSummary.py

which must correspond to the different massFractionSummary plots throughout different levels of the hierarchy.

By reading models/ecoli/analysis/variant/massFractionSummary.py a little bit, we see that:

the plotting is done with Matplotlib, hurray
it is reading its data from files under ./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/, more precisely ./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/columns/<column-name>/data. They are binary files however.
Looking at the source for wholecell/io/tablereader.py shows that those are just a standard NumPy serialization mechanism. Maybe they should have used the Hierarchical Data Format instead.
We can also take this opportunity to try and find where the data is coming from. Mass from the ./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/ looks like an ID, so we grep that and we reach models/ecoli/listeners/mass.py.
From this we understand that all data that is to be saved from a simulation must be coming from listeners: likely nothing, or not much, is dumped by default, because otherwise it would take up too much disk space. You have to explicitly say what it is that you want to save via a listener that acts on each time step.

Figure 1.
Minimal condition mass fraction plot
. Source. File name: `out/manual/plotOut/svg_plots/massFractionSummary.svg`

More plot types will be explored at time series run variant, where we will contrast two runs with different growth mediums.

 Read the full article

Updates / Generating test data for full text search tests Created 2024-12-23 Updated 2025-07-16

 View more

I've been thinking lightly about adding full text search to OurBigBook.

For example, at docs.ourbigbook.com/news/article-and-topic-id-prefix-search article search was added, but it only finds if you search something that appears right at the start of a title, e.g. for:

Fundamental theorem of calculus

you'd get a hit for:

fundamental

but not for

calculus

To do this efficiently, we need full text search, which PostgreSQL implements.

But finding a clean way to generate test data for testing out the speedup was not so easy and exploration into this led me to publishing a few new slightly improved methods where Googlers can now find them:

unix.stackexchange.com/questions/97160/is-there-something-like-a-lorem-ipsum-generator/787733#787733 I propose a neat random "sentence" generator using common CLI tools like grep and sed and the pre-installed Ubuntu dictionary /usr/share/dict/american-english:
```
grep -v "'" /usr/share/dict/american-english |
shuf -r |
paste -d ' ' $(printf "%4s" | sed 's/ /- /g') |
sed -e 's/^$.$/\U\1/;s/$/./' |
head -n10000000 \
> lorem.txt
```
- to achieve that, I also proposed two superior "join every N lines" method for the CLI: stackoverflow.com/questions/25973140/joining-every-group-of-n-lines-into-one-with-bash/79257780#79257780, notably this awk poem:
  seq 10 | awk '{ printf("%s%s", NR == 1 ? "" : NR % 3 == 1 ? "\n" : " ", $0 ) } END { printf("\n") }'

stackoverflow.com/questions/3371503/sql-populate-table-with-random-data/79255281#79255281 I propose:

a clean PostgreSQL random string stored procedure that picks random characters from an allowed character list

CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$
select
string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word
from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-    ')) as symbols(characters)
join generate_series(1, $1) on 1 = 1
$$ language sql;

first generating PostgreSQL data as CSV, and then importing the CSV into PostgreSQL as a more flexible method. This can also be done in a streaming fashion from stdin which is neat.
```
python generate_data.py 10 | psql mydb -c '\copy "mytable" FROM STDIN'
```

stackoverflow.com/questions/16020164/psqlexception-error-syntax-error-in-tsquery/79437030#79437030 regarding the safe generation of prefix search tsquery from user inputs without query errors, I've learned about websearch_to_tsquery and further highlighted a possible tsquery -> text -> tsquery approach that might be correct for prefix searches
stackoverflow.com/questions/67438575/fulltext-search-using-sequelize-postgres/79439253#79439253 I put everything together into a minimal Sequelize example, read for usage in OurBigBook

Finally I did a writeup summarizing PostgreSQL full text search: Section "PostgreSQL full-text search" and also dumped it at: www.reddit.com/r/PostgreSQL/comments/12yld1o/is_it_worth_using_postgres_builtin_fulltext/ for good measure.

 Read the full article

Ciro Santilli @cirosantilli 37

 Incoming links: grep

 Incoming links: `grep`