Introducing the Mapillary Traffic Sign Dataset for Teaching Machines to Understand Traffic Signs Globally

Today we’re releasing the Mapillary Traffic Sign Dataset, the world’s most diverse publicly available dataset of traffic sign annotations on street-level imagery that will help improve traffic safety and navigation everywhere. Covering different regions, weather and light conditions, camera sensors, and viewpoints, it enables developing high-performing traffic sign recognition models in both academic and commercial research.

Christian Ertler

27 Jun 2019

Mapillary Traffic Sign Dataset

Traffic signs are a key feature for traffic regulation, affecting all of us on a daily basis through navigation and traffic safety. With the development of autonomous driving, it’s crucial that not only humans but also machines are able to accurately perceive and understand traffic signs. To teach this skill to vehicles, automating the process of recognizing traffic signs has been essential in different applications. In the era of deep learning, large amounts of diverse training data are one of the key components to enable training a neural network for detecting and recognizing traffic signs.

To enable developing accurate and robust algorithms for traffic sign detection and classification, we have designed and compiled the Mapillary Traffic Sign Dataset—the first dataset to cover the global diversity of traffic signs with variation in appearance and geographic extent. We have carefully designed a manual annotation workflow and extended it with an algorithmic method to obtain high-quality bounding box annotations of traffic signs in all selected images.

100,000 high-resolution images from all over the world with bounding box annotations of over 300 classes of traffic signs.

Over 52,000 images with traffic signs that are manually verified and annotated (~36,000 images for training, ~5,000 images for validation, and ~11,000 images for testing).
In addition, over 48,000 images where traffic signs are partially annotated by automatically generating labels in neighboring images based on correspondence information from 3D reconstruction.

Global geographic reach, covering North and South America, Europe, Africa, Asia, and Oceania.

High variability in weather conditions (sun, rain, snow, fog, haze) and capturing times (dawn, daylight, dusk, and even at night).

A broad range of camera sensors (including panoramic images), varying focal length, image aspect ratios, and different types of camera noise as well as different capturing viewpoints (from the road, sidewalks, and off-road).

Mapillary Traffic Signs Dataset annotations on street-level images

Bounding box annotations from the Mapillary Traffic Sign Dataset
(click on an image to view in full resolution)

Why this dataset

It might seem like traffic signs have pretty standard looks and should be relatively straightforward to identify but in fact, it’s still quite challenging. Signs can be easily confused with banners and billboards, be damaged or hidden by trees or other things, or are hardly visible in low-light conditions and due to reflectivity. They’re also relatively small compared to other objects in the street scene, and therefore particularly difficult to recognize when seen from a distance. And last but not least, it’s not always easy to understand if a sign is a different version in the same sign class, or a whole other class—with a different meaning.

There have been many efforts to create good traffic sign datasets for training purposes. However, the existing publicly available datasets are either limited in scale or lack diversity in geographical distribution, sensor types, weather and light conditions, etc. This seriously limits the possibility to train high-performing deep learning algorithms and do comprehensive benchmarking of a traffic sign detection or recognition system.

We built this dataset because we need solid training data ourselves. With more than 570 million images on the Mapillary platform, uploaded by people from every corner of the world, we've together managed to build the ideal platform for selecting images for a training dataset.

We’ve fine-tuned the process of annotating traffic signs in this imagery to ensure the quality of data (read more about quality assurance below). In addition to human annotations, the dataset also contains images with machine-generated partial annotations, with which we hope to pave the way for future research in semi-supervised learning. We’ve made the dataset available for both academic and commercial research because we believe that sharing data is crucial for increasing global traffic safety as well as pushing the boundaries of computer vision research.

Visit the Dataset page to learn more and access the data →

Statistics

The fully annotated set of the Mapillary Traffic Sign Dataset (MTSD) includes a total of 52,453 images with 257,543 traffic sign bounding boxes. The additional, partially annotated dataset contains 47,547 images with more than 80,000 signs that are automatically labeled with correspondence information from 3D reconstruction.

When it comes to the scale and diversity of the dataset, we put it into the perspective of previous datasets in the research community. In this case, we have compared with the Tsinghua-Tencent 100K traffic sign dataset (TT100K). TT100K is a country-specific traffic sign dataset with images collected in China that contains 10,000 images with traffic signs and 90,000 background images without any traffic signs.

The left plot in the figure shows a comparison of the traffic sign class distribution between MTSD and TT100K. MTSD has approximately twice as many traffic sign classes as TT100K. The plot in the middle compares the areas of signs in terms of pixels in the original resolution of the containing image. MTSD covers a broad range of traffic sign sizes with an almost uniform distribution up to 256 px. In comparison to TT100K, MTSD provides a higher fraction of extreme sizes, which helps address one of the big challenges in traffic sign detection.

Finally, the plot on the right shows the distribution of images over the number of signs within the image. Besides the higher volume of images, MTSD contains a larger fraction of images with a large number of traffic signs.

Left: number of traffic sign classes. Middle: number of signs binned by size. Right: number of images binned by number of traffic sign instances. (Click on a graph to view it larger.)

When comparing to a global dataset like the Mapillary Vistas Dataset, MTSD contains both more images and traffic signs annotated. And more important, MTSD is designed specifically to entail detailed annotation of classes and attributes for traffic signs, which is not included in the Vistas.

Geographically, we can also see from the figure below that MTSD images are distributed globally with good coverage of different countries. This is essential to ensure the diversity of the traffic sign classes in terms of their appearance and scene characteristics, which is the key to enhance the generalization ability of the dataset when used for training machine learning algorithms for diverse applications.

Geographic distribution of the Mapillary Traffic Sign Dataset

Geographic distribution of the images

Quality control

The essential aspect for a benchmark or a training dataset of this scale is the quality of the annotation.

To ensure the quality of annotation, we have enforced annotation with a continuous QC process to quickly identify problems during annotation. We have also introduced a second-stage QA process done by an independent annotator for 5,000 random images including 26,000 signs. We found that only 0.5% of bounding boxes needed correction; the false negative rate was at 0.89% (most of the cases are very small signs) while the false positive rate was at 2.45%.

Impact on trained models

Given the scale and diversity of the dataset, we have also studied its effectiveness in transfer learning. Specifically, we would like to show how the global annotations in MTSD can be used to improve traffic sign recognition in a country-specific application. In this case, we focused on the evaluation on the TT100K dataset for China. For the detection task of localizing all bounding boxes of traffic signs in images, we have chosen the popular detection baseline FasterRCNN with Feature Pyramid Network (FPN) and a ResNet-50 backbone.

We evaluated the setup of pre-training: one with ImageNet dataset and one with MTSD. The ImageNet dataset is the most commonly used dataset in transfer learning for pre-training the feature layers of a neural network. For both setups, we fine-tuned the network on TT100K.

The binary detection baseline that was pre-trained with MTSD out-performed the one with ImageNet by an absolute improvement of 6.3% in average precision; in the multi-class setup we got an absolute improvement of 6.1%. This is a very significant performance boost in terms of traffic sign detection.

	AP	AP (small)	AP (medium)	AP (large)
ImageNet pre-training	91.27	84.01	95.87	90.13
MTSD pre-training	97.6	93.13	99.03	98.44

Binary detection results

	mAP	mAP (small)	mAP (medium)	mAP (large)
ImageNet pre-training	89.6	83.9	93.0	84.3
MTSD pre-training	95.7	91.3	96.9	96.7

Multi-class detection results

Accessing the Mapillary Traffic Sign Dataset

The Mapillary Traffic Sign Dataset is available both as a commercial edition and a research edition. Go to the dataset page to learn more and access the dataset. Currently, you can download a sample of the images but we intend to enhance the dataset page with the possibility to interactively browse the whole dataset.

We’re looking forward to seeing the research results achieved with the Mapillary Traffic Sign Dataset, and their applications for improving traffic safety and navigation, particularly in relation to self-driving. If you have any questions or comments, please get in touch with us.

/Christian, Computer Vision Engineer

Visit the Mapillary Traffic Sign Dataset page to get started →

Continue the conversation

Twitter LinkedIn Facebook