Kaggle by Ciro Santilli 35 Updated +Created
To be fair, this is one of the least worse ones.
Forsyth-Edwards Notation by Ciro Santilli 35 Updated +Created
The cool thing about this notation is that is showed to Ciro Santilli that there is more state to a chess game than just the board itself! Notably:
  • whose move it is next
  • castling availability
  • en passant availability
plus some other boring draw rules counters.
Computerphile by Ciro Santilli 35 Updated +Created
MNIST database by Ciro Santilli 35 Updated +Created
60,000 28x28 grayscale images of hand-written digits 0-9, i.e. 10 categories.
Playing with it is the de-facto computer vision hello world.
But it is important to note that as of the 2010's, the benchmark had become too easy for many application.
The dataset can be downloaded from yann.lecun.com/exdb/mnist/:
wget \
 http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz \
 http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz \
 http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz \
 http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
but doing so is kind of pointless as both files use some crazy single-file custom binary format to store all images and labels. OMG!
Figure 1.
MNIST image 1 of a '0'
.
Figure 2.
MNIST image 21 of a '0'
.
Figure 3.
MNIST image 3 of a '1'
.
CIFAR-10 by Ciro Santilli 35 Updated +Created
60,000 32x32 color images in 10 different classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.
TODO release date.
This dataset can be thought of as an intermediate between the simplicity of MNIST, and a more full blown ImageNet.
ImageNet by Ciro Santilli 35 Updated +Created
14 million images, more than 20k categories, typically denoting prominent objects in the image, either common daily objects, or a wild range of animals. About 1 million of them also have bounding boxes for the objects.
Each image appears to have a single label associated to it. Care must have been taken somehow with categories, since some images contain severl possible objects, e.g. a person and some object.
In practice however, the ILSVRC subset is more commonly used.
Official project page: www.image-net.org/
The data license is restrictive and forbids commercial usage: www.image-net.org/download.php.
The categories are all part of WordNet, which means that there are several parent/child categories such as dog vs type of dog available. ImageNet1k only appears to have leaf nodes however (i.e. no "dog" label, just specific types of dog).
COCO dataset by Ciro Santilli 35 Updated +Created
From cocodataset.org/:
  • 330K images (>200K labeled)
  • 1.5 million object instances
  • 80 object categories
  • 91 stuff categories
  • 5 captions per image. A caption is a short textual description of the image.
So they have relatively few object labels, but their focus seems to be putting a bunch of objects on the same image. E.g. they have 13 cat plus pizza photos. Searching for such weird combinations is kind of fun.
Their official dataset explorer is actually good: cocodataset.org/#explore
And the objects don't just have bounding boxes, but detailed polygons.
Also, images have captions describing the relation between objects:
a black and white cat standing on a table next to a pizza.
Epic.
This dataset is kind of cool.
Open Images dataset by Ciro Santilli 35 Updated +Created
TODO vs COCO dataset.
As of v7:
The images and annotations are both under CC BY, with Google as the copyright holder.
Allen brain atlas by Ciro Santilli 35 Updated +Created
Connectome scale by Ciro Santilli 35 Updated +Created
A Drosophila melanogaster has about 135k neurons, and we only managed to reconstruct its connectome in 2023.
The human brain has 86 billion neurons, about 1 million times more. Therefore, it is obvious that we are very very far away from a full connectome.
Instead however, we could look at larger scales of connectome, and then try from that to extract modules, and then reverse engineer things module by module.
This is likely how we are going to "understand how the human brain works".
Some notable connectomes:
Microscopy connectome extraction by Ciro Santilli 35 Updated +Created
This is the most plausible way of obtaining a full connectome looking from 2020 forward. Then you'd observe the slices with an electron microscope + appropriate Staining. Superintelligence by Nick Bostrom (2014) really opened Ciro Santilli's eyes to this possibility.
Once this is done for a human, it will be one of the greatest milestone of humanities, coparable perhaps to the Human Genome Project. BUt of course, privacy issues are incrediby pressing in this case, even more than in the human genome project, as we would essentially be able to read the brain of the person after their death.
As of 2022, the Drosophila connectome had been almost fully extracted.
This is also a possible path towards post-mortem brain reading.
Figure 1. Source. Unconfirmed, but looks like the type of frozen brain where a Microtome would be used.
Cultured meat company by Ciro Santilli 35 Updated +Created
Database trigger by Ciro Santilli 35 Updated +Created
Stored procedure by Ciro Santilli 35 Updated +Created
Ciro's call hierarchy notation by Ciro Santilli 35 Updated +Created
This is a simple hierarchical plaintext notation Ciro Santilli created to explain programs to himself.
It is usuall created by doing searches in an IDE, and then manually selecting the information of interest.
It attempts to capture intuitive information not only of the call graph itself, including callbacks, but of when things get called or not, by the addition of some context code.
For example, consider the following pseudocode:
f1() {
}

f2(i) {
  if (i > 5) {
    f1()
  }
}

f3() {
  f1()
  f2_2()
}

f2_2() {
  for (i = 0; i < 10; i++) {

    f2(i)
  }
}

main() {
  f2_2()
  f3()
}
Supose that we are interested in determining what calls f1.
Then a reasonable call hierarchy for f1 would be:
f2(i)
  if (i > 5) {
    f1()

  f2_2()
    for (i = 0; i < 10; i++) {
      f2(i)

    main
    f3
f3()
  main()
Some general principles:
  • start with a regular call tree
  • to include context:
    • remove any blank lines from the snippet of interest
    • add it indented below the function
    • and then follow it up with a blank line
    • and then finally add any callers at the same indentation level
DNA sequencing milestone by Ciro Santilli 35 Updated +Created
Most of these are going to be Whole-genome sequencing of some model organism:en.wikipedia.org/wiki/Whole_genome_sequencing#History lists them all. Basically th big "firsts" all happened in the 1990s and early 2000s.
Whole-genome sequencing by Ciro Santilli 35 Updated +Created
Delft University of Technology by Ciro Santilli 35 Updated +Created

Unlisted articles are being shown, click here to show only listed articles.