330K images (>200K labeled)
1.5 million object instances
80 object categories
91 stuff categories
5 captions per image. A caption is a short textual description of the image.

So they have relatively few object labels, but their focus seems to be putting a bunch of objects on the same image. E.g. they have 13 cat plus pizza photos. Searching for such weird combinations is kind of fun.

Their official dataset explorer is actually good: cocodataset.org/#explore

And the objects don't just have bounding boxes, but detailed polygons.

Also, images have captions describing the relation between objects:

a black and white cat standing on a table next to a pizza.

Epic.

This dataset is kind of cool.

Original 2014 paper by Microsoft: arxiv.org/abs/1405.0312

Table of contents
- COCO subset
  - COCO 2017 COCO subset

COCO subset

COCO 2017

This is the one used on MLperf v2.1 ResNet, likely one of the most popular choices out there.

2017 challenge subset:

train: 118k images, 18GB
validation: 5k images, 1GB
test: 41k images, 6GB

 Ancestors

 Incoming links

Discussion (0)

There are no discussions about this article yet.

View article source

COCO dataset (2014)

COCO subset

COCO 2017

 Ancestors

 Incoming links

Discussion (0) Subscribe (1)

Discussion (0)