- 330K images (>200K labeled)
- 1.5 million object instances
- 80 object categories
- 91 stuff categories
- 5 captions per image. A caption is a short textual description of the image.
So they have relatively few object labels, but their focus seems to be putting a bunch of objects on the same image. E.g. they have 13 cat plus pizza photos. Searching for such weird combinations is kind of fun.
Their official dataset explorer is actually good:
And the objects don't just have bounding boxes, but detailed polygons.
Also, images have captions describing the relation between objects:Epic.
a black and white cat standing on a table next to a pizza.
This dataset is kind of cool.
New to topics? Read the docs here!