Building entrance detected using Trained YOLOv8 Model and Mapillary Imagery.
Most navigation systems can tell you where a building is, but not where the entrance of that building is. In dense urban environments, this can be a real frustration, where pedestrians are routed to the street across from a building instead of a usable entrance. Building entrances are a critical missing layer in open geospatial data. They’re also something satellite imagery simply can’t capture.
Entrances detected using trained model in Red
Our goal was simple in principle, but challenging in practice: Automatically extract precise building entrance coordinates from street-level imagery and associate them with real buildings. This requires reasoning through noisy camera metadata, partial or occluded views of entrances, and ambiguity when multiple buildings appear in a single image.
To address this, we built an end-to-end, open-source pipeline that retrieves street-level imagery, detects entrances in those images, and assigns each detection to its associated building—all through the Mapillary API. The result is a system that can turn raw imagery into structured, mappable entrance data.
The challenge in creating this system lies not only in detecting entrances, but in determining which building a detected entrance belongs to. Entrances vary widely in appearance, and street-level images often contain multiple buildings in view. Early approaches that matched detections to the nearest building by distance alone frequently produced incorrect associations.
Entrances detected with Precision scores
To overcome this, we created a custom dataset of Mapillary images labeled with building entrances and fine-tuned a commonly used open-source object detection model, YOLOv8, specifically for entrance detection. Detection alone, however, is not sufficient, so we designed the pipeline around explicit geometric reasoning. By combining camera metadata (such as bearings and field of view) with building footprints, we can estimate where a detected entrance lies on the ground and match it to the building whose footprint best aligns with that distance and direction. We implement this step using a local coordinate transformation powered by the open-source PROJ framework.
Conceptually, our approach can be thought of as standing at the camera location, drawing a line in the direction where the model detected an entrance, and finding which building that line lands on. This geometric constraint helps filter out implausible matches and makes the system’s outputs interpretable. In our pipeline, finding entrances with deep learning becomes a tool inside a larger spatial system, producing interpretable results.
Pipeline diagram to detect Building entrances
Our fine-tuned YOLOv8 model was trained on 750 labeled Mapillary images collected from San Francisco. On our validation set (consisting of images from Santa Cruz and Seattle), the model achieved:
While the 62% recall means we miss some entrances (often due to occlusions or poor viewing angles), the 83% precision ensures that most detections we do make are legitimate entrances correctly localized to their buildings.
Example: Detected Buildings (Green) and Entrances (Red) in San Francisco. 50 entrances identified; average localization error: 4.26 m
More importantly, validation against geojson.io shows that detected entrances are consistently matched to the correct buildings. In most cases, remaining distance error arises from estimating depth from a single image, which can slightly over- or under-estimate how far an entrance is from the camera. The estimated direction remains accurate, allowing entrances to be placed along the correct building even when the final point is offset by a few meters.
As with any street-level approach, results depend on imagery coverage and metadata accuracy. Entrances may be occluded, relocated, or absent from available imagery, and small errors in camera orientation can introduce localization noise. To mitigate these issues, our pipeline leverages detector confidence scores from the entrance detection model.
While this implementation operates over relatively small geographic regions, the approach is inherently scalable: larger areas can be processed by applying the same pipeline iteratively across adjacent regions.
Beyond building entrances, the framework is designed to be feature-agnostic. Additional detectors could be used to identify wheelchair-accessible entrances, ramps, or emergency exits, while relying on the same spatial matching logic to place these features on the map.
Most importantly, this project demonstrates that combining open imagery and open maps can produce new, valuable geospatial layers that don’t exist today. It shows what’s possible when developers combine Mapillary imagery, open building datasets, computer vision, and spatial reasoning in a fully automated and reproducible pipeline.
High-quality map attributes should not require manual collection. If you’re a developer interested in spatial computing, open data, or computer vision, you can also start building with Mapillary with Mapillary’s developer resources.