Is there nothing standardized besides just raw images?

E.g. www.nist.gov/system/files/documents/2021/02/25/ansi-nist_2007_griffin-face-std-m1.pdf from 2005 by NIST says:

Specify face images because there is no agreement on a standard face recognition template - Unlike finger minutiae ...

so comparing it to fingerprint file formats such as ISO 19794-2. Sad!

Face clustering

 0  0

Given multiple images, decide how many people show up these images and when each person shows up.

One particular case of this is for videos, where you also have a timestamp for each image, and way more data.

Bibliography:

Pre-trained computer vision model

 0  0

 Tagged

torchvision

Pre-trained computer vision model CLI

 0  0

yolov5-pip

 0  0

github.com/fcakyon/yolov5-pip

OK, now we're talking, two liner and you get a window showing bounding box object detection from your webcam feed!

python -m pip install -U yolov5==7.0.9
yolov5 detect --source 0

The accuracy is crap for anything but people. But still. Well done. Tested on Ubuntu 22.10, P51.

Video 1.

fcakyon/yolov5-pip webcam object detection demo by Ciro Santilli (2023)

Source.

MNIST database (1998)

 0  0

70,000 28x28 grayscale (1 byte per pixel) images of hand-written digits 0-9, i.e. 10 categories. 60k are considered training data, 10k are considered for test data.

This is THE "OG" computer vision dataset.

Playing with it is the de-facto computer vision hello world.

It was on this dataset that Yann LeCun made great progress with the LeNet model. Running LeNet on MNIST has to be the most classic computer vision thing ever. See e.g. activatedgeek/LeNet-5 for a minimal and modern PyTorch educational implementation.

But it is important to note that as of the 2010's, the benchmark had become too easy for many applications. It is perhaps fair to say that the next big dataset revolution of the same importance was with ImageNet.

The dataset could be downloaded from yann.lecun.com/exdb/mnist/ but as of March 2025 it was down and seems to have broken from time to time randomly, so Wayback Machine to the rescue:

wget \
 https://web.archive.org/web/20120828222752/http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz \
 https://web.archive.org/web/20120828182504/http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz \
 https://web.archive.org/web/20240323235739/http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz \
 https://web.archive.org/web/20240328174015/http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz

but doing so is kind of pointless as both files use some crazy single-file custom binary format to store all images and labels. OMG!

OK-ish data explorer: knowyourdata-tfds.withgoogle.com/#tab=STATS&dataset=mnist

Extract MNIST images

 0  0

Best algorithm for MNIST

 0  0

The table: en.wikipedia.org/w/index.php?title=MNIST_database&oldid=1152541822#Classifiers

Fashion MNIST (2017)

 0  0

Same style as MNIST: 28x28 grayscale images, but with clothes rather than hand written digits.

It was designed to be much harder than MNIST, and more representative of modern applications, while still retaining the low resolution of MNIST for simplicity of training.

CIFAR-10

 0  0

www.cs.toronto.edu/~kriz/cifar.html

60,000 tiny 32x32 color images in 10 different classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.

TODO release date.

This dataset can be thought of as an intermediate between the simplicity of MNIST, and a more full blown ImageNet.

https://web.archive.org/web/20250517192041im_/https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png

https://web.archive.org/web/20250517192041im_/https://www.cs.toronto.edu/~kriz/cifar-10-sample/automobile1.png

https://web.archive.org/web/20250517192041im_/https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird1.png

https://web.archive.org/web/20250517192041im_/https://www.cs.toronto.edu/~kriz/cifar-10-sample/cat1.png

Toronto faces dataset (TFD)

 0  0

TODO where to find it: www.kaggle.com/general/50987

Cited on original Generative adversarial network paper: proceedings.neurips.cc/paper_files/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf

ImageNet (2009)

 0  0

14 million images with more than 20k categories, typically denoting prominent objects in the image, either common daily objects, or a wild range of animals. About 1 million of them also have bounding boxes for the objects. The images have different sizes, they are not all standardized to a single size like MNIST ^[ref].

Each image appears to have a single label associated to it. Care must have been taken somehow with categories, since some images contain severl possible objects, e.g. a person and some object.

In practice, the ILSVRC subset of ImageNet is the most commonly used dataset.

Official project page: www.image-net.org/

The data license is restrictive and forbids commercial usage: www.image-net.org/download.php. Also as a result you have to login to download the dataset. Super annoying.

How to visualize: datascience.stackexchange.com/questions/111756/where-can-i-view-the-imagenet-classes-as-a-hierarchy-on-wordnet

The categories are all part of WordNet, which means that there are several parent/child categories such as dog vs type of dog available. ImageNet1k only appears to have leaf nodes however (i.e. no "dog" label, just specific types of dog).

A major model that performed well on ImageNet starting on 2012 and became notable is AlexNet.

Fei-Fei Li

 1  0

 Tagged

Stanford Vision and Learning Lab

ImageNet subset

 0  0

Subset generators:

github.com/mf1024/ImageNet-datasets-downloader generates on download, very good. As per github.com/mf1024/ImageNet-Datasets-Downloader/issues/14 counts go over the limit due to bad multithreading. Also unfortunately it does not start with a subset of 1k.
github.com/BenediktAlkin/ImageNetSubsetGenerator

Unfortunately, since ImageNet is a closed standard no one can upload such pre-made subsets, forcing everybody to download the full dataset, in ImageNet1k, which is huge!

Imagenette (Imagenet10)

 0  0

github.com/fastai/imagenette

An imagenet10 subset by fast.ai.

Size of full sized image version: 1.5 GB.

ImageNet Large Scale Visual Recognition Challenge dataset (ILSVRC, ImageNet1k)

 0  0

Subset of ImageNet. About 167.62 GB in size according to www.kaggle.com/competitions/imagenet-object-localization-challenge/data.

Contains 1,281,167 images and exactly 1k categories which is why this dataset is also known as ImageNet1k: datascience.stackexchange.com/questions/47458/what-is-the-difference-between-imagenet-and-imagenet1k-how-to-download-it

www.kaggle.com/competitions/imagenet-object-localization-challenge/overview clarifies a bit further how the categories are inter-related according to WordNet relationships:

The 1000 object categories contain both internal nodes and leaf nodes of ImageNet, but do not overlap with each other.

image-net.org/challenges/LSVRC/2012/browse-synsets.php lists all 1k labels with their WordNet IDs.

n02119789: kit fox, Vulpes macrotis
n02100735: English setter
n02096294: Australian terrier

There is a bug on that page however towards the middle:

n03255030: dumbbell
href="ht:
n02102040: English springer, English springer spaniel

and there is one missing label if we ignore that dummy href= line. A thinkg of beauty!

Also the lines are not sorted by synset, if we do then the first three lines are:

n01440764: tench, Tinca tinca
n01443537: goldfish, Carassius auratus
n01484850: great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias

gist.github.com/aaronpolhamus/964a4411c0906315deb9f4a3723aac57 has lines of type:

n02119789 1 kit_fox
n02100735 2 English_setter
n02110185 3 Siberian_husky

therefore numbered on the exact same order as image-net.org/challenges/LSVRC/2012/browse-synsets.php

gist.github.com/yrevar/942d3a0ac09ec9e5eb3a lists all 1k labels as a plaintext file with their benchmark IDs.

{0: 'tench, Tinca tinca',
 1: 'goldfish, Carassius auratus',
 2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',

therefore numbered on sorted order of image-net.org/challenges/LSVRC/2012/browse-synsets.php

The official line numbering in-benchmark-data can be seen at LOC_synset_mapping.txt, e.g. www.kaggle.com/competitions/imagenet-object-localization-challenge/data?select=LOC_synset_mapping.txt

n01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias

huggingface.co/datasets/imagenet-1k also has some useful metrics on the split:

train: 1,281,167 images, 145.7 GB zipped
validation: 50,000 images, 6.67 GB zipped
test: 100,000 images, 13.5 GB zipped

ImageNet1k download

 0  0

The official page: www.image-net.org/challenges/LSVRC/index.php points to a download link on Kaggle: www.kaggle.com/competitions/imagenet-object-localization-challenge/data Kaggle says that the size is 167.62 GB!

To download from Kaggle, create an API token on kaggle.com, which downloads a kaggle.json file then:

mkdir -p ~/.kaggle
mv ~/down/kaggle.json ~/.kaggle
python3 -m pip install kaggle
kaggle competitions download -c imagenet-object-localization-challenge

The download speed is wildly server/limited and take A LOT of hours. Also, the tool does not seem able to pick up where you stopped last time.

Another download location appears to be: huggingface.co/datasets/imagenet-1k on Hugging Face, but you have to login due to their license terms. Once you login you have a very basic data explorer available: huggingface.co/datasets/imagenet-1k/viewer/default/train.

Bibliography:

ImageNet competition

 0  0

ImageNet 2015

 0  0

COCO dataset (2014)

 0  0

cocodataset.org

From cocodataset.org/:

330K images (>200K labeled)
1.5 million object instances
80 object categories
91 stuff categories
5 captions per image. A caption is a short textual description of the image.

So they have relatively few object labels, but their focus seems to be putting a bunch of objects on the same image. E.g. they have 13 cat plus pizza photos. Searching for such weird combinations is kind of fun.

Their official dataset explorer is actually good: cocodataset.org/#explore

And the objects don't just have bounding boxes, but detailed polygons.

Also, images have captions describing the relation between objects: