Ciro Santilli @cirosantilli 37

 Incoming links: ImageNet

AlexNet Created 2025-03-20 Updated 2025-07-16

 View more

Became notable for performing extremely well on ImageNet starting in 2012.

It is also notable for being one of the first to make successful use of GPU training rather than GPU training.

 Read the full article

CIFAR-10 Updated 2025-07-16

 View more

www.cs.toronto.edu/~kriz/cifar.html

60,000 tiny 32x32 color images in 10 different classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.

TODO release date.

This dataset can be thought of as an intermediate between the simplicity of MNIST, and a more full blown ImageNet.

https://web.archive.org/web/20250517192041im_/https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png

https://web.archive.org/web/20250517192041im_/https://www.cs.toronto.edu/~kriz/cifar-10-sample/automobile1.png

https://web.archive.org/web/20250517192041im_/https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird1.png

https://web.archive.org/web/20250517192041im_/https://www.cs.toronto.edu/~kriz/cifar-10-sample/cat1.png

 Read the full article

ImageNet Updated 2025-07-16

 View more

14 million images with more than 20k categories, typically denoting prominent objects in the image, either common daily objects, or a wild range of animals. About 1 million of them also have bounding boxes for the objects. The images have different sizes, they are not all standardized to a single size like MNIST ^[ref].

Each image appears to have a single label associated to it. Care must have been taken somehow with categories, since some images contain severl possible objects, e.g. a person and some object.

In practice, the ILSVRC subset of ImageNet is the most commonly used dataset.

Official project page: www.image-net.org/

The data license is restrictive and forbids commercial usage: www.image-net.org/download.php. Also as a result you have to login to download the dataset. Super annoying.

How to visualize: datascience.stackexchange.com/questions/111756/where-can-i-view-the-imagenet-classes-as-a-hierarchy-on-wordnet

The categories are all part of WordNet, which means that there are several parent/child categories such as dog vs type of dog available. ImageNet1k only appears to have leaf nodes however (i.e. no "dog" label, just specific types of dog).

A major model that performed well on ImageNet starting on 2012 and became notable is AlexNet.

 Read the full article

ImageNet Large Scale Visual Recognition Challenge dataset Updated 2025-07-16

 View more

Subset of ImageNet. About 167.62 GB in size according to www.kaggle.com/competitions/imagenet-object-localization-challenge/data.

Contains 1,281,167 images and exactly 1k categories which is why this dataset is also known as ImageNet1k: datascience.stackexchange.com/questions/47458/what-is-the-difference-between-imagenet-and-imagenet1k-how-to-download-it

www.kaggle.com/competitions/imagenet-object-localization-challenge/overview clarifies a bit further how the categories are inter-related according to WordNet relationships:

The 1000 object categories contain both internal nodes and leaf nodes of ImageNet, but do not overlap with each other.

image-net.org/challenges/LSVRC/2012/browse-synsets.php lists all 1k labels with their WordNet IDs.

n02119789: kit fox, Vulpes macrotis
n02100735: English setter
n02096294: Australian terrier

There is a bug on that page however towards the middle:

n03255030: dumbbell
href="ht:
n02102040: English springer, English springer spaniel

and there is one missing label if we ignore that dummy href= line. A thinkg of beauty!

Also the lines are not sorted by synset, if we do then the first three lines are:

n01440764: tench, Tinca tinca
n01443537: goldfish, Carassius auratus
n01484850: great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias

gist.github.com/aaronpolhamus/964a4411c0906315deb9f4a3723aac57 has lines of type:

n02119789 1 kit_fox
n02100735 2 English_setter
n02110185 3 Siberian_husky

therefore numbered on the exact same order as image-net.org/challenges/LSVRC/2012/browse-synsets.php

gist.github.com/yrevar/942d3a0ac09ec9e5eb3a lists all 1k labels as a plaintext file with their benchmark IDs.

{0: 'tench, Tinca tinca',
 1: 'goldfish, Carassius auratus',
 2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',

therefore numbered on sorted order of image-net.org/challenges/LSVRC/2012/browse-synsets.php

The official line numbering in-benchmark-data can be seen at LOC_synset_mapping.txt, e.g. www.kaggle.com/competitions/imagenet-object-localization-challenge/data?select=LOC_synset_mapping.txt

n01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias

huggingface.co/datasets/imagenet-1k also has some useful metrics on the split:

train: 1,281,167 images, 145.7 GB zipped
validation: 50,000 images, 6.67 GB zipped
test: 100,000 images, 13.5 GB zipped

 Read the full article

ImageNet subset Updated 2025-07-16

 View more

Subset generators:

github.com/mf1024/ImageNet-datasets-downloader generates on download, very good. As per github.com/mf1024/ImageNet-Datasets-Downloader/issues/14 counts go over the limit due to bad multithreading. Also unfortunately it does not start with a subset of 1k.
github.com/BenediktAlkin/ImageNetSubsetGenerator

Unfortunately, since ImageNet is a closed standard no one can upload such pre-made subsets, forcing everybody to download the full dataset, in ImageNet1k, which is huge!

 Read the full article

MLperf Updated 2025-07-16

 View more

mlcommons.org/en/ Their homepage is not amazingly organized, but it does the job.

Benchmark focused on deep learning. It has two parts:

training: produces a trained network
inference: uses the trained network

Furthermore, a specific network model is specified for each benchmark in the closed category: so it goes beyond just specifying the dataset.

Results can be seen e.g. at:

Those URLs broke as of 2025 of course, now you have to click on their Tableau down to the 2.1 round and there's no fixed URL for it:

And there are also separate repositories for each:

E.g. on mlcommons.org/en/training-normal-21/ we can see what the the benchmarks are:

Dataset	Model
ImageNet	ResNet
KiTS19	3D U-Net
OpenImages	RetinaNet
COCO dataset	Mask R-CNN
LibriSpeech	RNN-T
Wikipedia	BERT
1TB Clickthrough	DLRM
Go	MiniGo

 Read the full article

MLperf v2.1 ResNet Updated 2025-07-16

 View more

Instructions at:

Ubuntu 22.10 setup with tiny dummy manually generated ImageNet and run on ONNX:

sudo apt install pybind11-dev

git clone https://github.com/mlcommons/inference
cd inference
git checkout v2.1

virtualenv -p python3 .venv
. .venv/bin/activate
pip install numpy==1.24.2 pycocotools==2.0.6 onnxruntime==1.14.1 opencv-python==4.7.0.72 torch==1.13.1

cd loadgen
CFLAGS="-std=c++14" python setup.py develop
cd -

cd vision/classification_and_detection
python setup.py develop
wget -q https://zenodo.org/record/3157894/files/mobilenet_v1_1.0_224.onnx
export MODEL_DIR="$(pwd)"
export EXTRA_OPS='--time 10 --max-latency 0.2'

tools/make_fake_imagenet.sh
DATA_DIR="$(pwd)/fake_imagenet" ./run_local.sh onnxruntime mobilenet cpu --accuracy

Last line of output on P51, which appears to contain the benchmark results

TestScenario.SingleStream qps=58.85, mean=0.0138, time=0.136, acc=62.500%, queries=8, tiles=50.0:0.0129,80.0:0.0137,90.0:0.0155,95.0:0.0171,99.0:0.0184,99.9:0.0187

where presumably qps means queries per second, and is the main results we are interested in, the more the better.

Running:

tools/make_fake_imagenet.sh

produces a tiny ImageNet subset with 8 images under fake_imagenet/.

fake_imagenet/val_map.txt contains:

val/800px-Porsche_991_silver_IAA.jpg 817
val/512px-Cacatua_moluccensis_-Cincinnati_Zoo-8a.jpg 89
val/800px-Sardinian_Warbler.jpg 13
val/800px-7weeks_old.JPG 207
val/800px-20180630_Tesla_Model_S_70D_2015_midnight_blue_left_front.jpg 817
val/800px-Welsh_Springer_Spaniel.jpg 156
val/800px-Jammlich_crop.jpg 233
val/782px-Pumiforme.JPG 285

where the numbers are the category indices from ImageNet1k. At gist.github.com/yrevar/942d3a0ac09ec9e5eb3a see e.g.:

817: 'sports car, sport car',
89: 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita',

and so on, so they are coherent with the image names. By quickly looking at the script we see that it just downloads from Wikimedia and manually creates the file.

TODO prepare and test on the actual ImageNet validation set, README says:

Prepare the imagenet dataset to come.

Since that one is undocumented, let's try the COCO dataset instead, which uses COCO 2017 and is also a bit smaller. Note that his is not part of MLperf anymore since v2.1, only ImageNet and open images are used. But still:

wget https://zenodo.org/record/4735652/files/ssd_mobilenet_v1_coco_2018_01_28.onnx
DATA_DIR_BASE=/mnt/data/coco
export DATA_DIR="${DATADIR_BASE}/val2017-300"
mkdir -p "$DATA_DIR_BASE"
cd "$DATA_DIR_BASE"
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip val2017.zip
unzip annotations_trainval2017.zip
mv annotations val2017
cd -
cd "$(git-toplevel)"
python tools/upscale_coco/upscale_coco.py --inputs "$DATA_DIR_BASE" --outputs "$DATA_DIR" --size 300 300 --format png
cd -

Now:

./run_local.sh onnxruntime mobilenet cpu --accuracy

fails immediately with:

No such file or directory: '/path/to/coco/val2017-300/val_map.txt

The more plausible looking:

./run_local.sh onnxruntime mobilenet cpu --accuracy --dataset coco-300

first takes a while to preprocess something most likely, which it does only one, and then fails:

Traceback (most recent call last):
  File "/home/ciro/git/inference/vision/classification_and_detection/python/main.py", line 596, in <module>
    main()
  File "/home/ciro/git/inference/vision/classification_and_detection/python/main.py", line 468, in main
    ds = wanted_dataset(data_path=args.dataset_path,
  File "/home/ciro/git/inference/vision/classification_and_detection/python/coco.py", line 115, in __init__
    self.label_list = np.array(self.label_list)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (5000, 2) + inhomogeneous part.

TODO!

 Read the full article

MNIST database Updated 2025-07-16

 View more

70,000 28x28 grayscale (1 byte per pixel) images of hand-written digits 0-9, i.e. 10 categories. 60k are considered training data, 10k are considered for test data.

This is THE "OG" computer vision dataset.

Playing with it is the de-facto computer vision hello world.

It was on this dataset that Yann LeCun made great progress with the LeNet model. Running LeNet on MNIST has to be the most classic computer vision thing ever. See e.g. activatedgeek/LeNet-5 for a minimal and modern PyTorch educational implementation.

But it is important to note that as of the 2010's, the benchmark had become too easy for many applications. It is perhaps fair to say that the next big dataset revolution of the same importance was with ImageNet.

The dataset could be downloaded from yann.lecun.com/exdb/mnist/ but as of March 2025 it was down and seems to have broken from time to time randomly, so Wayback Machine to the rescue:

wget \
 https://web.archive.org/web/20120828222752/http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz \
 https://web.archive.org/web/20120828182504/http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz \
 https://web.archive.org/web/20240323235739/http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz \
 https://web.archive.org/web/20240328174015/http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz

but doing so is kind of pointless as both files use some crazy single-file custom binary format to store all images and labels. OMG!

OK-ish data explorer: knowyourdata-tfds.withgoogle.com/#tab=STATS&dataset=mnist

 Read the full article