Automatically Detecting Building Entrances with Mapillary

The Researchers from the University of Santa Cruz are working on enhancing navigation data by automatically extracting precise entrance locations for buildings using Mapillary and open source vision models. Training over 750 images, the team achieved a precision of 83.3% and a recall of 62%, resulting in an accurate detection model aimed at improving accessibility and data coverage. As part of the Terraforma programme by the Overture Maps Foundation, read more on how this partnership with UCSC is leveraging modern foundation models and AI to improve geospatial data.
Evan Rantala
Julien Howard
14 January 2026

Building entrance detected using Trained YOLOv8 Model and Mapillary Imagery.

Why Building Entrances Matter

Most navigation systems can tell you where a building is, but not where the entrance of that building is. In dense urban environments, this can be a real frustration, where pedestrians are routed to the street across from a building instead of a usable entrance. Building entrances are a critical missing layer in open geospatial data. They’re also something satellite imagery simply can’t capture.

Entrances detected using trained model in Red

Project Overview

Our goal was simple in principle, but challenging in practice: Automatically extract precise building entrance coordinates from street-level imagery and associate them with real buildings. This requires reasoning through noisy camera metadata, partial or occluded views of entrances, and ambiguity when multiple buildings appear in a single image.

To address this, we built an end-to-end, open-source pipeline that retrieves street-level imagery, detects entrances in those images, and assigns each detection to its associated building—all through the Mapillary API. The result is a system that can turn raw imagery into structured, mappable entrance data.

The challenge in creating this system lies not only in detecting entrances, but in determining which building a detected entrance belongs to. Entrances vary widely in appearance, and street-level images often contain multiple buildings in view. Early approaches that matched detections to the nearest building by distance alone frequently produced incorrect associations.

Entrances detected with Precision scores

To overcome this, we created a custom dataset of Mapillary images labeled with building entrances and fine-tuned a commonly used open-source object detection model, YOLOv8, specifically for entrance detection. Detection alone, however, is not sufficient, so we designed the pipeline around explicit geometric reasoning. By combining camera metadata (such as bearings and field of view) with building footprints, we can estimate where a detected entrance lies on the ground and match it to the building whose footprint best aligns with that distance and direction. We implement this step using a local coordinate transformation powered by the open-source PROJ framework.

Conceptually, our approach can be thought of as standing at the camera location, drawing a line in the direction where the model detected an entrance, and finding which building that line lands on. This geometric constraint helps filter out implausible matches and makes the system’s outputs interpretable. In our pipeline, finding entrances with deep learning becomes a tool inside a larger spatial system, producing interpretable results.

Key Challenges

  • Occlusions and Partial Views: Street-level imagery captures real-world conditions: parked cars block doorways, trees obscure facades, and pedestrians walk in front of entrances. While our YOLOv8 model can detect many partially visible entrances, heavy occlusions remain a limitation. We mitigate this by filtering low-confidence detections to reduce false positives.
  • Multi-Building Ambiguity: In dense urban areas, a single image often shows more than one building. When the model detects an entrance, which building does it belong to? This is where our geometric reasoning approach proved critical. By projecting detections along camera sight lines and comparing distances to building footprints, we can identify the most plausible match. The system rejects detections that don't align reasonably with any nearby wall.
  • Limited Imagery Coverage: The system's effectiveness depends entirely on Mapillary's coverage. Even in urban areas, coverage is often patchy and random, depending on where contributors happened to capture imagery. We address this by allowing users to specify minimum capture dates, ensuring they're working with the freshest available data.

How the Pipeline Works

Pipeline diagram to detect Building entrances

  1. The user specifies a geographic area of interest, and all relevant data from Overture Maps Foundation’s Buildings and Places datasets is retrieved for that region.
  2. We then query nearby Mapillary imagery within the same radius to be used as candidates with potential entrances.
  3. A fine-tuned YOLOv8 model is applied to each image to detect building entrances.
  4. To connect each detected entrance in an image with real-world map data, we estimate where that entrance lies on the ground relative to the camera. Using the camera’s GPS position and viewing direction, we cast a ray outward from the camera through the detected doorway, effectively turning a pixel in the image into a direction and distance in the real world. This allows us to work in a simple, local map coordinate system measured in meters rather than latitude and longitude.
  5. Finally, because we already have building footprints from Overture Maps, each prediction point is matched to the building outline it most plausibly lies on, based on its position and viewing direction.

Performance and Accuracy

Our fine-tuned YOLOv8 model was trained on 750 labeled Mapillary images collected from San Francisco. On our validation set (consisting of images from Santa Cruz and Seattle), the model achieved:

  • Precision: 83.3% (of detected entrances, how many were real entrances)
  • Recall: 62% (of real entrances in images, how many we detected)

While the 62% recall means we miss some entrances (often due to occlusions or poor viewing angles), the 83% precision ensures that most detections we do make are legitimate entrances correctly localized to their buildings.

Example: Detected Buildings (Green) and Entrances (Red) in San Francisco. 50 entrances identified; average localization error: 4.26 m

More importantly, validation against geojson.io shows that detected entrances are consistently matched to the correct buildings. In most cases, remaining distance error arises from estimating depth from a single image, which can slightly over- or under-estimate how far an entrance is from the camera. The estimated direction remains accurate, allowing entrances to be placed along the correct building even when the final point is offset by a few meters.

Future Work

As with any street-level approach, results depend on imagery coverage and metadata accuracy. Entrances may be occluded, relocated, or absent from available imagery, and small errors in camera orientation can introduce localization noise. To mitigate these issues, our pipeline leverages detector confidence scores from the entrance detection model.

While this implementation operates over relatively small geographic regions, the approach is inherently scalable: larger areas can be processed by applying the same pipeline iteratively across adjacent regions.

Beyond building entrances, the framework is designed to be feature-agnostic. Additional detectors could be used to identify wheelchair-accessible entrances, ramps, or emergency exits, while relying on the same spatial matching logic to place these features on the map.

Why This Matters

Most importantly, this project demonstrates that combining open imagery and open maps can produce new, valuable geospatial layers that don’t exist today. It shows what’s possible when developers combine Mapillary imagery, open building datasets, computer vision, and spatial reasoning in a fully automated and reproducible pipeline.

High-quality map attributes should not require manual collection. If you’re a developer interested in spatial computing, open data, or computer vision, you can also start building with Mapillary with Mapillary’s developer resources.

Links and Contact