This one is good: stackoverflow.com/questions/36533429/generate-random-string-in-postgresql/44200391#44200391 as it also describes how to generate multiple values.
with symbols(characters) as (VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'))
select string_agg(substr(characters, (random() * (length(characters) - 1) + 1)::INTEGER, 1), '')
from symbols
join generate_series(1,8) as word(chr_idx) on 1 = 1 -- word length
join generate_series(1,10000) as words(idx) on 1 = 1 -- # of words
group by idx;
Then you can insert it into a row with:
create table tmp(s text);
insert into tmp(s)
select s from
(
with symbols(characters) as (VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'))
select string_agg(substr(characters, (random() * (length(characters) - 1) + 1)::INTEGER, 1), '') as asdf
from symbols
join generate_series(1,8) as word(chr_idx) on 1 = 1 -- word length
join generate_series(1,10000) as words(idx) on 1 = 1 -- # of words
group by idx
) as sub(s);
A more convenient approach is likely to define the function:
CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$
select
string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word
from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 --')) as symbols(characters)
join generate_series(1, $1) on 1 = 1
$$ language sql;
And then:
create table tmp(s text, t text);
insert into tmp(s) select random_string(10) from generate_series(10);
- output one column per line: stackoverflow.com/questions/9604723/alternate-output-format-for-psql-showing-one-column-per-line-with-column-name
- PostgreSQL does not automatically index foreign keys! stackoverflow.com/questions/970562/postgres-and-indexes-on-foreign-keys-and-primary-keys
Convert project Gutenberg King James Bible to verse number to text dataset Updated 2024-12-15 +Created 2024-12-05
This section is about converting: www.gutenberg.org/ebooks/10, and most likely the plaintext version: stackoverflow.com/a/43060761/895245 to the same data format as www.kaggle.com/datasets/oswinrh/bible mapping book/chapter/verse to the text:
1 1 1 In the beginning God created the heaven and the earth.
1 1 2 And the earth was without form, and void; and darkness was upon the face of the deep.
On particular annoyance is that the txt version has multiple verses per line at times.
We'd likely just want to use a slightly modified version of: stackoverflow.com/a/43060761/895245 that searches for patterns of type:with incremental integers.
(\d+):(\d+)
Verse number to text:
- www.kaggle.com/datasets/oswinrh/bible verse to text. TODO how was it generated
- gist.github.com/sebastiancarlos/2fdb072e46ee80038d6da196cc0bb8bc
- sebastiancarlos.com/religions-please-migrate-your-holy-texts-to-json-76bce058291d
- www.reddit.com/r/programming/comments/wv6q6r/religions_please_migrate_your_holy_texts_to_json/
- github.com/acrawford73/postgresql-bible-kjv pre-extracted CSV with undocumented method
Actual NLP into text:
- github.com/BradyStephenson/bible-data actually digs a bit into the text
Frozen and cut on Microtome at 1mm intervals.
It is obscene that the DVSA, which is the UK organ that officially setups up the theory driving tests, does not make all of its material open.
As such, it uses its monopoly on the test as an advantage to sell books and services that allow you to pass the tests.
Particularly obscene is their focus on an online service: www.safedrivingforlife.info/shop/official-dvsa-theory-test-kit-car-drivers-elearning (archive) a that you can only use for 30 days and costs 15 pounds as of 2024. This way you can't even buy the book used or go to a library.
Obscene.
Paying to take the test is fine. But paying for learning materials which help everyone become better drivers and saves lives? Obscene.
The UK organ responsible for granting driving licenses.
This one has a 3D model of C. elegans containing all the cells, browsable on the browser at: browser.openworm.org/.
Fantastic resouce that contains cross sections of C. elegans at various lengths of its body. Presumably frozen and cut with a Microtome and then scanned with electron microscopy.
Shame that there are so many parts missing.
It's the Visible Human Project, but for C. elegans!
This contains the C. elegans connectome.
The browseable thing is this massive interactive PDF: wormwiring.org/papers/Interactive-Diagram.pdf. It lists neurons from the C. elegans cell lineage using the standard cell names, and how they connect to each other. Some make a surprising ammount of connections.
TODO what does it contain. Does it have metabolic pathways?
Let's look into a sample plot,
out/manual/plotOut/svg_plots/massFractionSummary.svg
, and try to understand as much as we can about what it means and how it was generated.This plot contains how much of each type of mass is present in all cells. Since we simulated just one cell, it will be the same as the results for that cell.
We can see that all of them grow more or less linearly, perhaps as the start of an exponential. We can see that all of them grow more or less linearly, perhaps as the start of an exponential. We can see that all of them grow more or less linearly, perhaps as the start of an exponential.which must correspond to the different
By grepping the title "Cell mass fractions" in the source code, we see the files:
models/ecoli/analysis/cohort/massFractionSummary.py
models/ecoli/analysis/multigen/massFractionSummary.py
models/ecoli/analysis/variant/massFractionSummary.py
massFractionSummary
plots throughout different levels of the hierarchy.By reading
models/ecoli/analysis/variant/massFractionSummary.py
a little bit, we see that:- the plotting is done with Matplotlib, hurray
- it is reading its data from files under
./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/
, more precisely./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/columns/<column-name>/data
. They are binary files however.Looking at the source forwholecell/io/tablereader.py
shows that those are just a standard NumPy serialization mechanism. Maybe they should have used the Hierarchical Data Format instead.We can also take this opportunity to try and find where the data is coming from.Mass
from the./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/
looks like an ID, so wegrep
that and we reachmodels/ecoli/listeners/mass.py
.From this we understand that all data that is to be saved from a simulation must be coming from listeners: likely nothing, or not much, is dumped by default, because otherwise it would take up too much disk space. You have to explicitly say what it is that you want to save via a listener that acts on each time step.
More plot types will be explored at time series run variant, where we will contrast two runs with different growth mediums.
All publicly funded research and teaching materials should be free Updated 2024-12-15 +Created 2024-12-04
Superset of force public university teachers to publish their teaching material with an open license.
One particularly obscene case Ciro Santilli encountered was: all DVSA materials should be free.
There are unlisted articles, also show them or only show them.