Upgrading to Vistas 2.0

Today we are making available Mapillary Vistas 2.0, a major semantic annotation upgrade for our street-level image dataset of 25,000 images from around the world. We are increasing the label complexity by almost doubling the amount of semantic categories, and additionally provide an approximate depth order of objects shown in the scenes.

With the initial release of the Mapillary Vistas dataset to researchers in May 2017, we invited the computer vision research community to work on challenging, real-world street-level data for image understanding tasks including semantic- or instance-specific segmentation, and later also panoptic segmentation. Our idea for Vistas was to compile a representative set of images from the Mapillary platform, and make it available together with corresponding high-quality, pixel-wise, human-drawn annotations delineating semantic concepts and objects in the data. With its scale, Vistas is ideally suited for training modern, deep learning based recognition models, providing fine-grained, semantic interpretation of street-level data. Below, you can see results on a test video, obtained from our latest, state-of-the-art Panoptic Segmentation model when trained on Mapillary Vistas 2.0.

Over the last few years, we co-organised multiple segmentation challenges and workshops like LSUN @CVPR 2017, Joint COCO and Mapillary Workshops at ECCV 2018 and ICCV 2019, and the Robust Vision Challenge at CVPR 2020. Thanks to advances in the field, we observed impressive performance gains. In semantic segmentation, mean Intersection-over-Union (a key performance metric), improved from 41% to 61% in only three years.

Despite the advent of several street-level datasets since our initial release of Vistas, it still remains one of the most challenging datasets due to its specific characteristics like:

  • Worldwide spread of image capture locations
  • Diverse weather and illumination conditions including imagery from snowy, rainy, foggy environments or taken at dusk/dawn
  • High-resolution images with an image resolution of 2-22 Megapixels
  • Heterogeneous set of camera models and lenses used for image capture

We now release Mapillary Vistas 2.0 — an updated version with substantially increased granularity of semantic labels, but preserving compatibility in terms of RGB images used in the training/validation/test splits. While the initial version contained annotations for 66 object classes (37 of them instance-specifically annotated), the new release features 124 (70 of them instance-specifically annotated) semantic categories, distributed as visualised in the histograms below.

Histogram of stuff-category objects in Mapillary Vistas 2.0. Histogram of stuff-category objects in Mapillary Vistas 2.0.

Histogram of thing-category objects in Mapillary Vistas 2.0.

Histogram of thing-category objects in Mapillary Vistas 2.0.

In addition to providing more detailed class labels, we are also releasing raw annotations which provide information about the approximate order of segments towards the camera position. The annotation protocol for each image is required to first draw polygons for the most distant object class (e.g. sky), before adding objects closer to the camera.

Vistas 2.0 is now available for direct download here. It contains:

  • All RGB images for training/validation/test
  • The previous Mapillary Vistas v1.2 labels
  • The new set of labels introduced in this post

We hope that Mapillary Vistas 2.0 will contribute to further improvements in scene understanding and fine-grained object detection of street-level data.

Continue the conversation