ImageNet Large Scale Visual Recognition Challenge dataset Updated 2024-12-15 +Created 1970-01-01
Subset of ImageNet. About 167.62 GB in size according to www.kaggle.com/competitions/imagenet-object-localization-challenge/data.
Contains 1,281,167 images and exactly 1k categories which is why this dataset is also known as ImageNet1k: datascience.stackexchange.com/questions/47458/what-is-the-difference-between-imagenet-and-imagenet1k-how-to-download-it
www.kaggle.com/competitions/imagenet-object-localization-challenge/overview clarifies a bit further how the categories are inter-related according to WordNet relationships:
The 1000 object categories contain both internal nodes and leaf nodes of ImageNet, but do not overlap with each other.
image-net.org/challenges/LSVRC/2012/browse-synsets.php lists all 1k labels with their WordNet IDs.There is a bug on that page however towards the middle:and there is one missing label if we ignore that dummy
n02119789: kit fox, Vulpes macrotis
n02100735: English setter
n02096294: Australian terrier
n03255030: dumbbell
href="ht:
n02102040: English springer, English springer spaniel
href=
line. A thinkg of beauty!Also the lines are not sorted by synset, if we do then the first three lines are:
n01440764: tench, Tinca tinca
n01443537: goldfish, Carassius auratus
n01484850: great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
gist.github.com/aaronpolhamus/964a4411c0906315deb9f4a3723aac57 has lines of type:therefore numbered on the exact same order as image-net.org/challenges/LSVRC/2012/browse-synsets.php
n02119789 1 kit_fox
n02100735 2 English_setter
n02110185 3 Siberian_husky
gist.github.com/yrevar/942d3a0ac09ec9e5eb3a lists all 1k labels as a plaintext file with their benchmark IDs.therefore numbered on sorted order of image-net.org/challenges/LSVRC/2012/browse-synsets.php
{0: 'tench, Tinca tinca',
1: 'goldfish, Carassius auratus',
2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
The official line numbering in-benchmark-data can be seen at
LOC_synset_mapping.txt
, e.g. www.kaggle.com/competitions/imagenet-object-localization-challenge/data?select=LOC_synset_mapping.txtn01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
huggingface.co/datasets/imagenet-1k also has some useful metrics on the split:
- train: 1,281,167 images, 145.7 GB zipped
- validation: 50,000 images, 6.67 GB zipped
- test: 100,000 images, 13.5 GB zipped
Subset generators:
- github.com/mf1024/ImageNet-datasets-downloader generates on download, very good. As per github.com/mf1024/ImageNet-Datasets-Downloader/issues/14 counts go over the limit due to bad multithreading. Also unfortunately it does not start with a subset of 1k.
- github.com/BenediktAlkin/ImageNetSubsetGenerator
Unfortunately, since ImageNet is a closed standard no one can upload such pre-made subsets, forcing everybody to download the full dataset, in ImageNet1k, which is huge!
mlcommons.org/en/ Their homepage is not amazingly organized, but it does the job.
Benchmark focused on deep learning. It has two parts:Furthermore, a specific network model is specified for each benchmark in the closed category: so it goes beyond just specifying the dataset.
Results can be seen e.g. at:
- training: mlcommons.org/en/training-normal-21/
- inference: mlcommons.org/en/inference-datacenter-21/
And there are also separate repositories for each:
E.g. on mlcommons.org/en/training-normal-21/ we can see what the the benchmarks are:
Dataset | Model |
---|---|
ImageNet | ResNet |
KiTS19 | 3D U-Net |
OpenImages | RetinaNet |
COCO dataset | Mask R-CNN |
LibriSpeech | RNN-T |
Wikipedia | BERT |
1TB Clickthrough | DLRM |
Go | MiniGo |
Instructions at:
Ubuntu 22.10 setup with tiny dummy manually generated ImageNet and run on ONNX:
sudo apt install pybind11-dev
git clone https://github.com/mlcommons/inference
cd inference
git checkout v2.1
virtualenv -p python3 .venv
. .venv/bin/activate
pip install numpy==1.24.2 pycocotools==2.0.6 onnxruntime==1.14.1 opencv-python==4.7.0.72 torch==1.13.1
cd loadgen
CFLAGS="-std=c++14" python setup.py develop
cd -
cd vision/classification_and_detection
python setup.py develop
wget -q https://zenodo.org/record/3157894/files/mobilenet_v1_1.0_224.onnx
export MODEL_DIR="$(pwd)"
export EXTRA_OPS='--time 10 --max-latency 0.2'
tools/make_fake_imagenet.sh
DATA_DIR="$(pwd)/fake_imagenet" ./run_local.sh onnxruntime mobilenet cpu --accuracy
Last line of output on P51, which appears to contain the benchmark resultswhere presumably
TestScenario.SingleStream qps=58.85, mean=0.0138, time=0.136, acc=62.500%, queries=8, tiles=50.0:0.0129,80.0:0.0137,90.0:0.0155,95.0:0.0171,99.0:0.0184,99.9:0.0187
qps
means queries per second, and is the main results we are interested in, the more the better.Running:produces a tiny ImageNet subset with 8 images under
tools/make_fake_imagenet.sh
fake_imagenet/
.fake_imagenet/val_map.txt
contains:val/800px-Porsche_991_silver_IAA.jpg 817
val/512px-Cacatua_moluccensis_-Cincinnati_Zoo-8a.jpg 89
val/800px-Sardinian_Warbler.jpg 13
val/800px-7weeks_old.JPG 207
val/800px-20180630_Tesla_Model_S_70D_2015_midnight_blue_left_front.jpg 817
val/800px-Welsh_Springer_Spaniel.jpg 156
val/800px-Jammlich_crop.jpg 233
val/782px-Pumiforme.JPG 285
- 817: 'sports car, sport car',
- 89: 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita',
TODO prepare and test on the actual ImageNet validation set, README says:
Prepare the imagenet dataset to come.
Since that one is undocumented, let's try the COCO dataset instead, which uses COCO 2017 and is also a bit smaller. Note that his is not part of MLperf anymore since v2.1, only ImageNet and open images are used. But still:
wget https://zenodo.org/record/4735652/files/ssd_mobilenet_v1_coco_2018_01_28.onnx
DATA_DIR_BASE=/mnt/data/coco
export DATA_DIR="${DATADIR_BASE}/val2017-300"
mkdir -p "$DATA_DIR_BASE"
cd "$DATA_DIR_BASE"
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip val2017.zip
unzip annotations_trainval2017.zip
mv annotations val2017
cd -
cd "$(git-toplevel)"
python tools/upscale_coco/upscale_coco.py --inputs "$DATA_DIR_BASE" --outputs "$DATA_DIR" --size 300 300 --format png
cd -
Now:fails immediately with:The more plausible looking:first takes a while to preprocess something most likely, which it does only one, and then fails:
./run_local.sh onnxruntime mobilenet cpu --accuracy
No such file or directory: '/path/to/coco/val2017-300/val_map.txt
./run_local.sh onnxruntime mobilenet cpu --accuracy --dataset coco-300
Traceback (most recent call last):
File "/home/ciro/git/inference/vision/classification_and_detection/python/main.py", line 596, in <module>
main()
File "/home/ciro/git/inference/vision/classification_and_detection/python/main.py", line 468, in main
ds = wanted_dataset(data_path=args.dataset_path,
File "/home/ciro/git/inference/vision/classification_and_detection/python/coco.py", line 115, in __init__
self.label_list = np.array(self.label_list)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (5000, 2) + inhomogeneous part.
TODO!