{c}
{tag=Closed standard}
{title2=2009}
{wiki}

14 million images with more than 20k categories, typically denoting prominent objects in the image, either common daily objects, or a wild range of animals. About 1 million of them also have <bounding boxes> for the objects. The images have different sizes, they are not all standardized to a single size like <MNIST>https://stackoverflow.com/questions/36109886/what-is-the-resolution-of-an-image-in-imagenet-dataset{ref}.

Each image appears to have a single label associated to it. Care must have been taken somehow with categories, since some images contain severl possible objects, e.g. a person and some object.

In practice, the <ILSVRC> subset of <ImageNet> is the most commonly used dataset.

Official project page: https://www.image-net.org/

The data license is restrictive and forbids commercial usage: https://www.image-net.org/download.php[]. Also as a result you have to login to download the dataset. Super annoying.

How to visualize: https://datascience.stackexchange.com/questions/111756/where-can-i-view-the-imagenet-classes-as-a-hierarchy-on-wordnet

The categories are all part of <WordNet>, which means that there are several parent/child categories such as dog vs type of dog available. <ImageNet1k> only appears to have leaf nodes however (i.e. no "dog" label, just specific types of dog).

A major model that performed well on <ImageNet> starting on 2012 and became notable is <AlexNet>.


ImageNet

{c}
{title2=2014-}

https://storage.googleapis.com/openimages/web/index.html

TODO vs <COCO dataset>.

As of v7:
* ~9M images
* 600 object classes
* <bounding boxes>
* visual relatoinships are really hard: https://storage.googleapis.com/openimages/web/factsfigures_v7.html#visual-relationships e.g. "person kicking ball": https://storage.googleapis.com/openimages/web/visualizer/index.html?type=relationships&set=train&c=kick
* https://google.github.io/localized-narratives/ localized narratives is ludicrous, you can actually hear the (<Indian> women mostly) annotators describing the image while hovering their mouses to point what they are talking about). They are clearly bored out of their minds the poor people!

The images and annotations are both under <CC BY>, with <Google> as the copyright holder.


Open Images dataset

{c}
{tag=YOLO model}

https://github.com/fcakyon/yolov5-pip

OK, now we're talking, two liner and you get a window showing <bounding box> object detection from your <webcam> feed!
``
python -m pip install -U yolov5==7.0.9
yolov5 detect --source 0
``
The accuracy is crap for anything but people. But still. Well done. Tested on <Ubuntu 22.10>, <Ciro Santilli's hardware/P51>.

\Video[https://www.youtube.com/watch?v=1MD3Wn7e6OE]
{title=fcakyon/yolov5-pip webcam object detection demo by <Ciro Santilli> (2023)}


Ciro Santilli @cirosantilli 37

 Incoming links: Bounding box