Full Dataset

Register here to download the ADE20K dataset and annotations. By doing so, you agree to the terms of use.


See our GitHub Repository for an overview of how to access and explore ADE20K.

Scene Parsing Benchmark

Scene parsing data and part segmentation data derived from ADE20K dataset could be downloaded from MIT Scene Parsing Benchmark.

Terms of Use

See ADE20K's dataset Terms of Use

Training set
25.574 images

All images are fully annotated with objects and, many of the images have parts too.

Validation set
2.000 images

Fully annotated with objects and parts

Test set
Images to be released later.

Consistency set
64 images and annotations used for checking the annotation consistency (download)


The annotated images cover the scene categories from the SUN and Places database. Here there are some examples showing the images, object segmentations, and parts segmentations:

The next visualization provides the list of objects and parts and the number of annotated instances. The tree only shows objects with more than 250 annotated instances and parts with more than 10 annotated instances.

Some classes can be both objects and parts. For instance, a "door" can be an object (in an indoor picture), or a part (when it is the "door" of a "car"). Some objects are always parts (e.g., a "leg", a "hand", ...), although, in some cases they can appear detached of the whole (e.g., a car "wheel" inside a garage), and some object are never parts (e.g., a "person", a "truck", ...). The same name class (e.g., "door") can correspond to several visual categories depending on which object it is a part of. For instance a car door is visually different from a cabinet door or a building door. However they share similar affordances. The value proportionClassIsPart(c) can be used to decide if a class behaves mostly as an object or as a part. When an object is not part of another object its segmentation mask will appear inside *_seg.png. If the class behaves as a part, then the segmentation mask will appear inside *_seg_parts.png. Correctly detecting an object requires classifying if the object is behaving as an independent object or if it is a part of another object.


If you find this dataset useful, please cite the following publication:

Scene Parsing through ADE20K Dataset. Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso and Antonio Torralba. Computer Vision and Pattern Recognition (CVPR), 2017. [PDF] [bib]

Semantic Understanding of Scenes through ADE20K Dataset. Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso and Antonio Torralba. International Journal on Computer Vision (IJCV). [PDF] [bib]