GNOME Chess by Ciro Santilli 35 Updated +Created
The user friendly Chess UI! Exactly what you would expect from a GNOME Project package. But also packs some punch via the Universal Chess Interface, e.g. Stockfish just works.
GNU Chess by Ciro Santilli 35 Updated +Created
Both chess engine and a CLI chess UI. As an engine it is likely irrelevant compared to Stockfish as of 2020. TODO: does the UI support Universal Chess Interface?
Cool project history though. Started before the GNU Project itself, and became one of the first packages.
Shane's Chess Information Database by Ciro Santilli 35 Updated +Created
Advanced. Not beginner friendly, very clunky.
Protestantism by Ciro Santilli 35 Updated +Created
Kaggle by Ciro Santilli 35 Updated +Created
To be fair, this is one of the least worse ones.
Forsyth-Edwards Notation by Ciro Santilli 35 Updated +Created
The cool thing about this notation is that is showed to Ciro Santilli that there is more state to a chess game than just the board itself! Notably:
  • whose move it is next
  • castling availability
  • en passant availability
plus some other boring draw rules counters.
Computerphile by Ciro Santilli 35 Updated +Created
MNIST database by Ciro Santilli 35 Updated +Created
70,000 28x28 grayscale (1 byte per pixel) images of hand-written digits 0-9, i.e. 10 categories. 60k are considered training data, 10k are considered for test data.
Playing with it is the de-facto computer vision hello world.
It was on this dataset that Yann LeCun made great progress with the LeNet model. Running LeNet on MNIST has to be the most classic computer vision thing ever. See e.g. activatedgeek/LeNet-5 for a minimal and modern PyTorch educational implementation.
But it is important to note that as of the 2010's, the benchmark had become too easy for many applications. It is perhaps fair to say that the next big dataset revolution of the same importance was with ImageNet.
The dataset could be downloaded from yann.lecun.com/exdb/mnist/ but as of March 2025 it was down and seems to have broken from time to time randomly, so Wayback Machine to the rescue:
wget \
 https://web.archive.org/web/20120828222752/http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz \
 https://web.archive.org/web/20120828182504/http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz \
 https://web.archive.org/web/20240323235739/http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz \
 https://web.archive.org/web/20240328174015/http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
but doing so is kind of pointless as both files use some crazy single-file custom binary format to store all images and labels. OMG!
CIFAR-10 by Ciro Santilli 35 Updated +Created
60,000 32x32 color images in 10 different classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.
TODO release date.
This dataset can be thought of as an intermediate between the simplicity of MNIST, and a more full blown ImageNet.
ImageNet by Ciro Santilli 35 Updated +Created
14 million images with more than 20k categories, typically denoting prominent objects in the image, either common daily objects, or a wild range of animals. About 1 million of them also have bounding boxes for the objects. The images have different sizes, they are not all standardized to a single size like MNIST[ref].
Each image appears to have a single label associated to it. Care must have been taken somehow with categories, since some images contain severl possible objects, e.g. a person and some object.
In practice, the ILSVRC subset of ImageNet is the most commonly used dataset.
Official project page: www.image-net.org/
The data license is restrictive and forbids commercial usage: www.image-net.org/download.php. Also as a result you have to login to download the dataset. Super annoying.
The categories are all part of WordNet, which means that there are several parent/child categories such as dog vs type of dog available. ImageNet1k only appears to have leaf nodes however (i.e. no "dog" label, just specific types of dog).
A major model that performed well on ImageNet starting on 2012 and became notable is AlexNet.
COCO dataset by Ciro Santilli 35 Updated +Created
From cocodataset.org/:
  • 330K images (>200K labeled)
  • 1.5 million object instances
  • 80 object categories
  • 91 stuff categories
  • 5 captions per image. A caption is a short textual description of the image.
So they have relatively few object labels, but their focus seems to be putting a bunch of objects on the same image. E.g. they have 13 cat plus pizza photos. Searching for such weird combinations is kind of fun.
Their official dataset explorer is actually good: cocodataset.org/#explore
And the objects don't just have bounding boxes, but detailed polygons.
Also, images have captions describing the relation between objects:
a black and white cat standing on a table next to a pizza.
Epic.
This dataset is kind of cool.
Open Images dataset by Ciro Santilli 35 Updated +Created
As of v7:
The images and annotations are both under CC BY, with Google as the copyright holder.
Allen brain atlas by Ciro Santilli 35 Updated +Created
Connectome scale by Ciro Santilli 35 Updated +Created
A Drosophila melanogaster has about 135k neurons, and we only managed to reconstruct its connectome in 2023.
The human brain has 86 billion neurons, about 1 million times more. Therefore, it is obvious that we are very very far away from a full connectome.
Instead however, we could look at larger scales of connectome, and then try from that to extract modules, and then reverse engineer things module by module.
This is likely how we are going to "understand how the human brain works".
Microscopy connectome extraction by Ciro Santilli 35 Updated +Created
This is the most plausible way of obtaining a full connectome looking from 2020 forward. Then you'd observe the slices with an electron microscope + appropriate Staining. Superintelligence by Nick Bostrom (2014) really opened Ciro Santilli's eyes to this possibility.
Once this is done for a human, it will be one of the greatest milestone of humanities, coparable perhaps to the Human Genome Project. BUt of course, privacy issues are incrediby pressing in this case, even more than in the human genome project, as we would essentially be able to read the brain of the person after their death.
As of 2022, the Drosophila connectome had been almost fully extracted.
This is also a possible path towards post-mortem brain reading.
Figure 1. Source. Unconfirmed, but looks like the type of frozen brain where a Microtome would be used.
Cultured meat company by Ciro Santilli 35 Updated +Created
Database trigger by Ciro Santilli 35 Updated +Created

Unlisted articles are being shown, click here to show only listed articles.