Let's look into a sample plot,
out/manual/plotOut/svg_plots/massFractionSummary.svg
, and try to understand as much as we can about what it means and how it was generated.This plot contains how much of each type of mass is present in all cells. Since we simulated just one cell, it will be the same as the results for that cell.
We can see that all of them grow more or less linearly, perhaps as the start of an exponential. We can see that all of them grow more or less linearly, perhaps as the start of an exponential. We can see that all of them grow more or less linearly, perhaps as the start of an exponential.which must correspond to the different
By grepping the title "Cell mass fractions" in the source code, we see the files:
models/ecoli/analysis/cohort/massFractionSummary.py
models/ecoli/analysis/multigen/massFractionSummary.py
models/ecoli/analysis/variant/massFractionSummary.py
massFractionSummary
plots throughout different levels of the hierarchy.By reading
models/ecoli/analysis/variant/massFractionSummary.py
a little bit, we see that:- the plotting is done with Matplotlib, hurray
- it is reading its data from files under
./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/
, more precisely./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/columns/<column-name>/data
. They are binary files however.Looking at the source forwholecell/io/tablereader.py
shows that those are just a standard NumPy serialization mechanism. Maybe they should have used the Hierarchical Data Format instead.We can also take this opportunity to try and find where the data is coming from.Mass
from the./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/
looks like an ID, so wegrep
that and we reachmodels/ecoli/listeners/mass.py
.From this we understand that all data that is to be saved from a simulation must be coming from listeners: likely nothing, or not much, is dumped by default, because otherwise it would take up too much disk space. You have to explicitly say what it is that you want to save via a listener that acts on each time step.
More plot types will be explored at time series run variant, where we will contrast two runs with different growth mediums.
I've been thinking lightly about adding full text search to OurBigBook.
For example, at docs.ourbigbook.com/news/article-and-topic-id-prefix-search article search was added, but it only finds if you search something that appears right at the start of a title, e.g. for:you'd get a hit for:but not for
Fundamental theorem of calculus
fundamental
calculus
To do this efficiently, we need full text search, which PostgreSQL implements.
But finding a clean way to generate test data for testing out the speedup was not so easy and exploration into this led me to publishing a few new slightly improved methods where Googlers can now find them:
- unix.stackexchange.com/questions/97160/is-there-something-like-a-lorem-ipsum-generator/787733#787733 I propose a neat random "sentence" generator using common CLI tools like
grep
andsed
and the pre-installed Ubuntu dictionary/usr/share/dict/american-english
:grep -v "'" /usr/share/dict/american-english | shuf -r | paste -d ' ' $(printf "%4s" | sed 's/ /- /g') | sed -e 's/^\(.\)/\U\1/;s/$/./' | head -n10000000 \ > lorem.txt
- to achieve that, I also proposed two superior "join every N lines" method for the CLI: stackoverflow.com/questions/25973140/joining-every-group-of-n-lines-into-one-with-bash/79257780#79257780, notably this awk poem:
seq 10 | awk '{ printf("%s%s", NR == 1 ? "" : NR % 3 == 1 ? "\n" : " ", $0 ) } END { printf("\n") }'
- to achieve that, I also proposed two superior "join every N lines" method for the CLI: stackoverflow.com/questions/25973140/joining-every-group-of-n-lines-into-one-with-bash/79257780#79257780, notably this awk poem:
- stackoverflow.com/questions/3371503/sql-populate-table-with-random-data/79255281#79255281 I propose:
- a clean PostgreSQL random string stored procedure that picks random characters from an allowed character list
CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$ select string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789- ')) as symbols(characters) join generate_series(1, $1) on 1 = 1 $$ language sql;
- first generating PostgreSQL data as CSV, and then importing the CSV into PostgreSQL as a more flexible method. This can also be done in a streaming fashion from stdin which is neat.
python generate_data.py 10 | psql mydb -c '\copy "mytable" FROM STDIN'
- a clean PostgreSQL random string stored procedure that picks random characters from an allowed character list
Finally I did a writeup summarizing PostgreSQL full text search: Section "PostgreSQL full-text search" and also dumped it at: www.reddit.com/r/PostgreSQL/comments/12yld1o/is_it_worth_using_postgres_builtin_fulltext/ for good measure.