Winning the ECCV Semantic Segmentation Challenge for Autonomous Navigation in Unstructured Environments

Mapillary just scored another success in computer vision benchmarking. This wouldn't be possible without our community: by contributing more than 350 million images from all over the world, they've created a dataset that helps build robust algorithms for making sense of any street scene that a self-driving car may encounter.

Mapillary just scored another success in computer vision benchmarking. This wouldn't be possible without our community: by contributing more than 350 million images from all over the world, they've created a dataset that helps build robust algorithms for making sense of any street scene that a self-driving car may encounter.

Mapillary just won the Semantic Segmentation Challenge for Autonomous Navigation in Unstructured Environments, co-located with this year’s European Conference on Computer Vision. The challenge was held to push performance for autonomous navigation in so-called unstructured environments.

This is important because most solutions so far have been developed using data from environments where traffic is organized and infrastructure ideal, which is not always the case when an autonomous car would drive on real-life streets. In different parts of the world, street scenes might look considerably different and also have less than ideal infrastructure.

What differs this scene understanding challenge from others is the nature of the provided dataset. It consists of roughly 10,000 images from India—Hyderabad and Bangalore, to be precise. Street scenes from Hyderabad and Bangalore are different from those in Europe and the US, which poses a challenge to any algorithm that has only been trained on data representing the latter areas.

example1-1example1 One of the images presented during the challenge: the bottom one shows how our algorithm segmented the objects in the image

In this challenge, participants were expected to detect 26 object classes (i.e. humans, cars, vegetation, and so on) at pixel-level in the imagery. We built our winning model on top of an algorithm trained on our Mapillary Vistas Dataset, the world’s most diverse dataset of semantically segmented street-level imagery. The dataset includes imagery from indeed India, as well as places like Oman and Burkina Faso—the dataset covers six continents from where the Mapillary community has been contributing imagery.

Because we’ve trained our algorithms on street-level images from all over the world, our models are already robust and used to wildly different street scenes. That is a great starting point to develop the algorithms further using additional data, such as the one provided in this challenge.

We’ve also managed to mitigate the memory impact of our segmentation models at training time by roughly 50% through our recent innovation called In-Place Activated BatchNorm. In-Place ABN frees up GPU memory and, consequently, allows processing more data at higher resolution. This helps improving results for dense prediction tasks, like semantic segmentation, and is also what helped us win the CVPR Semantic Segmentation Challenge earlier this summer. In-Place ABN and the Vistas Dataset, in other words, are a winning combination.

example2-1 example2 Another example of an image presented during the challenge

The Vistas Dataset has helped Mapillary win numerous international benchmarking awards since we first released it last year. It’s also used by academic institutions and automotive players all over the world to push the boundaries of what’s possible as we teach machines to see. And it’s all possible thanks to the Mapillary community that’s contributing imagery to Mapillary at an exponential speed, from all over the world.

The baseline of the story is this: collaboration literally wins.

The Mapillary Vistas Dataset is free to use for researchers across the globe—read more here. If you’re interested in a commercial license, contact us here.

/Samuel

Continue the conversation