Introducing the Mapillary Street-Level Sequences Dataset for Lifelong Place Recognition
Today we’re releasing the Mapillary Street-Level Sequences Dataset, the world’s most diverse publicly available dataset for lifelong place recognition. The dataset enables benchmarking and training state-of-the-art machine learning models for large-scale lifelong place recognition as a direct result of its scale and diverse coverage in geographical areas, scene characteristics, and lighting conditions.
Place recognition is an essential task in computer vision, with vast applications in augmented reality and large-scale 3D reconstruction. The task of place recognition is to find the most similar place of a query image in a database of geo-located images.
Images from the dataset, captured at the same location at different times of day and/or year
To push the state-of-the-art methods for lifelong place recognition, we have created the Mapillary Street-Level Sequences dataset (MSLS), a collection of visually and geographically diverse image sequences bundled with rich metadata for training place recognition algorithms, including inter-sequence matching. The dataset features:
- More than 1.6 million images
- 30 major cities, ranging from Tokyo to San Francisco, across six continents
- All images tagged with sequence information, geo-located with GPS and compass angles
- Various capture times spanning all seasons over a nine-year period
- Different weather, cameras, daylight conditions, and structural settings
Each column shows two frames marked as ‘same place’ in MSLS, even though they were captured by different users, at different times of day or even different seasons, leading to very different appearances. Automatically detecting places when their appearance is so different is a challenging problem. MSLS can be used to train algorithms that are robust to such changes.
The role of place recognition and why it matters
Place recognition is an important component in many large-scale computer vision and robotics systems. To develop and benchmark place recognition, it is essential to have a dataset that is representative of diverse scenarios. Existing datasets have contributed broadly to the development of place recognition methods, but there are still limitations, particularly when developing methods based on deep learning, due to their limited scale and diversity in scene characteristics and appearance changes. With this in mind, we have designed and curated the MSLS dataset to be large and varied enough such that it can be used to train deep learning-based place recognition algorithms that also work outside of the lab.
Building the Mapillary Street-Level Sequences Dataset
The imagery on Mapillary is vast in terms of numbers and variety. More than one billion images have been uploaded from 190+ countries, at different times of day and year, using cameras ranging from smartphones to professional rigs. That's why the Mapillary imagery database is perfect for building a place recognition dataset. To collect a manageable but diverse dataset, we searched for imagery in 30 cities. To maximize visual diversity, we only used one sequence per Mapillary user, and looked for pairs of neighboring sequences that are captured at different times of the day, dates and seasons. We summarized the time distribution of MSLS in these charts:
The extension and diversity of the dataset allows for the training of place recognition systems that are robust to extreme appearance changes. The scope and variety of MSLS are only made possible thanks to the contributions of the Mapillary community. We owe a big thanks to the community for uploading millions of images from all over the world every single day. Together, we push the boundaries of what’s possible in the field of computer vision.
Benchmarking and establishing a new baseline
One of the reasons why we are so excited about MSLS is that initial benchmarking results show that training on MSLS gives better results than using previously available datasets. Previous datasets usually contain images from a limited geographical region (e.g. a single city). That makes it very challenging for methods trained on these datasets to perform well on the task of place recognition on arbitrary images from elsewhere in the world. In more technical words, methods trained on previous datasets generalize poorly.
In the following chart, we show that training only on MSLS yields state-of-the-art results on the MSLS itself (unsurprisingly), but also on the RobotCar dataset and Tokyo24/7. It is also very close to the state of the art on Pitts250k. Conversely, training on the previously available Pitts250k (a dataset that only contains images from Pittsburgh), produces noticeable worse results on datasets that don’t depict dense urban environments (RobotCar and MSLS).
Comparison of the top-5 recall of NetVLAD-based models trained on Pitts250k and MSLS on four different datasets.
The paper will be published as an oral at CVPR later this year. We would like to acknowledge Frederik Warburg for his impressive work on this paper as part of his internship at Mapillary. To learn more about the creation of MSLS and how it compares to other datasets, visit our project website and read the full paper.
Accessing The Mapillary Street-Level Sequences Dataset
The Mapillary Street-Level Sequence Dataset is available both as a commercial edition and a research edition. Go to the dataset page to learn more and access the dataset. We can’t wait to see the research results achieved with the Mapillary Street-Level Sequences Dataset in lifelong place recognition, and their applications for AR, robotics, and change detection. If you have any questions or comments, please get in touch.
/Manuel, Computer Vision Engineer