Winning at CVPR 2019: Mapillary Tops Two Computer Vision Benchmarking Challenges
We recently announced that Mapillary was going to publish four papers at CVPR this year and we are now proud to say that we also won two important benchmarking challenges at this year’s gathering of computer vision experts from around the world. I am happy and proud to work with such an exceptional research team.
Scene parsing with point-based supervision
Scene parsing with point-based supervision was organized by Vision Applications at the first Learning from Imperfect Data (LID) Challenge. Classifying the background categories of images (road, sky, wall, etc.) is a key problem in computer vision research and important for teaching machines to spatially navigate. Since manual annotation at the pixel level for this type of classification is difficult and expensive, our goal in this challenge was to learn a semantic segmentation model from very sparse point-based annotations—i.e., very poor training labels.
The gif below highlights what we mean by sparse/poor annotations. The first image (appearing as mostly black pixels) is the original imperfect label data where only 31 (or 0.007884% of all) pixels in that particular image are actually labeled. Our approach was to iteratively learn and generate finer-grained annotation results and eventually train on denser (but still imperfect) training data.
Our results on the unknown test data were ~30% mAP, which roughly matches what state-of-the-art approaches were able to produce from fully annotated images at the time of the dataset announcement in 2017.
nuScenes 3D detection challenge
We also won the nuScenes 3D Detection Task at the Workshop on Autonomous Driving organized by Berkeley, NVIDIA, MIT, and nuTonomy. This challenge was organized around the idea of understanding the current state of computer vision algorithms for solving the environmental perception of autonomous vehicles.
The goal was to provide 3D bounding boxes for 10 different classes (cars, motorcycles, pedestrians, traffic cones, etc.), class-specific attributes (like whether a car is driving or parked), and to provide the current velocity vector. From three possible submission tracks, we decided to take on the most challenging one, constraining to use only vision input. That means using just images—no LiDAR, map or Radar information.
Want to know more?
A summary of our most recent technical report, Disentangling Monocular 3D Object Detection, is available in the Mapillary Blog. If you would like to learn more about the four papers we published at CVPR this year, see the links below.
1. Seamless Scene Segmentation
2. AdaGraph: Unifying Predictive and Continuous Domain Adaptation through Graphs
3. Unsupervised Domain Adaptation using Feature-Whitening and Consensus Loss
4. Deep Single Image Camera Calibration with Radial Distortion
/Peter Kontschieder, Director of Research at Mapillary