14 million images with more than 20k categories, typically denoting prominent objects in the image, either common daily objects, or a wild range of animals. About 1 million of them also have bounding boxes for the objects. The images have different sizes, they are not all standardized to a single size like MNIST[ref].
Each image appears to have a single label associated to it. Care must have been taken somehow with categories, since some images contain severl possible objects, e.g. a person and some object.
Official project page: www.image-net.org/
The data license is restrictive and forbids commercial usage: www.image-net.org/download.php. Also as a result you have to login to download the dataset. Super annoying.
How to visualize: datascience.stackexchange.com/questions/111756/where-can-i-view-the-imagenet-classes-as-a-hierarchy-on-wordnet
The categories are all part of WordNet, which means that there are several parent/child categories such as dog vs type of dog available. ImageNet1k only appears to have leaf nodes however (i.e. no "dog" label, just specific types of dog).
ImageNet Large Scale Visual Recognition Challenge dataset
Subset of ImageNet. About 167.62 GB in size according to www.kaggle.com/competitions/imagenet-object-localization-challenge/data.
Contains 1,281,167 images and exactly 1k categories which is why this dataset is also known as ImageNet1k: datascience.stackexchange.com/questions/47458/what-is-the-difference-between-imagenet-and-imagenet1k-how-to-download-it
www.kaggle.com/competitions/imagenet-object-localization-challenge/overview clarifies a bit further how the categories are inter-related according to WordNet relationships:
The 1000 object categories contain both internal nodes and leaf nodes of ImageNet, but do not overlap with each other.
image-net.org/challenges/LSVRC/2012/browse-synsets.php lists all 1k labels with their WordNet IDs.There is a bug on that page however towards the middle:and there is one missing label if we ignore that dummy
n02119789: kit fox, Vulpes macrotis
n02100735: English setter
n02096294: Australian terrier
n03255030: dumbbell
n02102040: English springer, English springer spaniel
line. A thinkg of beauty!Also the lines are not sorted by synset, if we do then the first three lines are:
n01440764: tench, Tinca tinca
n01443537: goldfish, Carassius auratus
n01484850: great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
gist.github.com/aaronpolhamus/964a4411c0906315deb9f4a3723aac57 has lines of type:therefore numbered on the exact same order as image-net.org/challenges/LSVRC/2012/browse-synsets.php
n02119789 1 kit_fox
n02100735 2 English_setter
n02110185 3 Siberian_husky
gist.github.com/yrevar/942d3a0ac09ec9e5eb3a lists all 1k labels as a plaintext file with their benchmark IDs.therefore numbered on sorted order of image-net.org/challenges/LSVRC/2012/browse-synsets.php
{0: 'tench, Tinca tinca',
1: 'goldfish, Carassius auratus',
2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
The official line numbering in-benchmark-data can be seen at
, e.g. www.kaggle.com/competitions/imagenet-object-localization-challenge/data?select=LOC_synset_mapping.txtn01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
huggingface.co/datasets/imagenet-1k also has some useful metrics on the split:
- train: 1,281,167 images, 145.7 GB zipped
- validation: 50,000 images, 6.67 GB zipped
- test: 100,000 images, 13.5 GB zipped
Subset generators:
- github.com/mf1024/ImageNet-datasets-downloader generates on download, very good. As per github.com/mf1024/ImageNet-Datasets-Downloader/issues/14 counts go over the limit due to bad multithreading. Also unfortunately it does not start with a subset of 1k.
- github.com/BenediktAlkin/ImageNetSubsetGenerator
Unfortunately, since ImageNet is a closed standard no one can upload such pre-made subsets, forcing everybody to download the full dataset, in ImageNet1k, which is huge!
mlcommons.org/en/ Their homepage is not amazingly organized, but it does the job.
Benchmark focused on deep learning. It has two parts:Furthermore, a specific network model is specified for each benchmark in the closed category: so it goes beyond just specifying the dataset.
Results can be seen e.g. at:
Those URLs broke as of 2025 of course, now you have to click on their Tableau down to the 2.1 round and there's no fixed URL for it:
And there are also separate repositories for each:
E.g. on mlcommons.org/en/training-normal-21/ we can see what the the benchmarks are:
Dataset | Model |
ImageNet | ResNet |
KiTS19 | 3D U-Net |
OpenImages | RetinaNet |
COCO dataset | Mask R-CNN |
LibriSpeech | RNN-T |
Wikipedia | BERT |
1TB Clickthrough | DLRM |
Go | MiniGo |
Instructions at:
Ubuntu 22.10 setup with tiny dummy manually generated ImageNet and run on ONNX:
sudo apt install pybind11-dev
git clone https://github.com/mlcommons/inference
cd inference
git checkout v2.1
virtualenv -p python3 .venv
. .venv/bin/activate
pip install numpy==1.24.2 pycocotools==2.0.6 onnxruntime==1.14.1 opencv-python== torch==1.13.1
cd loadgen
CFLAGS="-std=c++14" python setup.py develop
cd -
cd vision/classification_and_detection
python setup.py develop
wget -q https://zenodo.org/record/3157894/files/mobilenet_v1_1.0_224.onnx
export MODEL_DIR="$(pwd)"
export EXTRA_OPS='--time 10 --max-latency 0.2'
DATA_DIR="$(pwd)/fake_imagenet" ./run_local.sh onnxruntime mobilenet cpu --accuracy
Last line of output on P51, which appears to contain the benchmark resultswhere presumably
TestScenario.SingleStream qps=58.85, mean=0.0138, time=0.136, acc=62.500%, queries=8, tiles=50.0:0.0129,80.0:0.0137,90.0:0.0155,95.0:0.0171,99.0:0.0184,99.9:0.0187
means queries per second, and is the main results we are interested in, the more the better.Running:produces a tiny ImageNet subset with 8 images under
contains:val/800px-Porsche_991_silver_IAA.jpg 817
val/512px-Cacatua_moluccensis_-Cincinnati_Zoo-8a.jpg 89
val/800px-Sardinian_Warbler.jpg 13
val/800px-7weeks_old.JPG 207
val/800px-20180630_Tesla_Model_S_70D_2015_midnight_blue_left_front.jpg 817
val/800px-Welsh_Springer_Spaniel.jpg 156
val/800px-Jammlich_crop.jpg 233
val/782px-Pumiforme.JPG 285
- 817: 'sports car, sport car',
- 89: 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita',
TODO prepare and test on the actual ImageNet validation set, README says:
Prepare the imagenet dataset to come.
Since that one is undocumented, let's try the COCO dataset instead, which uses COCO 2017 and is also a bit smaller. Note that his is not part of MLperf anymore since v2.1, only ImageNet and open images are used. But still:
wget https://zenodo.org/record/4735652/files/ssd_mobilenet_v1_coco_2018_01_28.onnx
export DATA_DIR="${DATADIR_BASE}/val2017-300"
mkdir -p "$DATA_DIR_BASE"
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip val2017.zip
unzip annotations_trainval2017.zip
mv annotations val2017
cd -
cd "$(git-toplevel)"
python tools/upscale_coco/upscale_coco.py --inputs "$DATA_DIR_BASE" --outputs "$DATA_DIR" --size 300 300 --format png
cd -
Now:fails immediately with:The more plausible looking:first takes a while to preprocess something most likely, which it does only one, and then fails:
./run_local.sh onnxruntime mobilenet cpu --accuracy
No such file or directory: '/path/to/coco/val2017-300/val_map.txt
./run_local.sh onnxruntime mobilenet cpu --accuracy --dataset coco-300
Traceback (most recent call last):
File "/home/ciro/git/inference/vision/classification_and_detection/python/main.py", line 596, in <module>
File "/home/ciro/git/inference/vision/classification_and_detection/python/main.py", line 468, in main
ds = wanted_dataset(data_path=args.dataset_path,
File "/home/ciro/git/inference/vision/classification_and_detection/python/coco.py", line 115, in __init__
self.label_list = np.array(self.label_list)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (5000, 2) + inhomogeneous part.
70,000 28x28 grayscale (1 byte per pixel) images of hand-written digits 0-9, i.e. 10 categories. 60k are considered training data, 10k are considered for test data.
This is THE "OG" computer vision dataset.
Playing with it is the de-facto computer vision hello world.
It was on this dataset that Yann LeCun made great progress with the LeNet model. Running LeNet on MNIST has to be the most classic computer vision thing ever. See e.g. activatedgeek/LeNet-5 for a minimal and modern PyTorch educational implementation.
But it is important to note that as of the 2010's, the benchmark had become too easy for many applications. It is perhaps fair to say that the next big dataset revolution of the same importance was with ImageNet.
The dataset could be downloaded from yann.lecun.com/exdb/mnist/ but as of March 2025 it was down and seems to have broken from time to time randomly, so Wayback Machine to the rescue:but doing so is kind of pointless as both files use some crazy single-file custom binary format to store all images and labels. OMG!
wget \
https://web.archive.org/web/20120828222752/http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz \
https://web.archive.org/web/20120828182504/http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz \
https://web.archive.org/web/20240323235739/http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz \
OK-ish data explorer: knowyourdata-tfds.withgoogle.com/#tab=STATS&dataset=mnist