How to Use Mapillary Data in Jupyter Notebooks

Jupyter Notebook is popular among geospatial engineers and it's often used in conjunction with platforms like ArcGIS. This is how you can make the most out of Mapillary data in Jupyter Notebook.

If you’re working with geospatial data and Python, there is good a chance you may be using Jupyter Notebooks on a regular basis. The Jupyter Notebook environment is becoming increasingly popular among geospatial developers, using it in conjunction with ArcGIS, GRASS GIS, Leaflet and more.

Mapillary is very easy to use in a Jupyter Notebook, particular when working with Mapillary APIs. The Mapillary APIs allow accessing point data as GeoJSONs. Using Python code in Jupyter Notebooks, it becomes much easier to make reusable API requests, to employ pagination, and to organize data in different ways once it is requested from an API.

Jupyter Logo

Access to the Mapillary APIs can depend on your data subscriptions. While the images and sequences APIs allow free and open access, you may also be interested in the map features API for retrieving traffic signs and other point data such as crosswalks, utility poles, or pavement markings.

In this blog post we’ll take a look at how to use Jupyter Notebooks and Python to download the location of a large number of images, to download sequence trace lines and visualize them, and to access the map features API.

Mapillary Images API

Let’s start by assuming you want to retrieve all images that you captured during November of 2019. To get started, we’ll need a few basic building blocks:

  • A client ID—sign up for yours by clicking Register application on the developer dashboard
  • The start and end dates in ISO format—in our case, 2019-11-01 and 2019-11-30
  • The username—mine is *chrisbeddow*
  • The API endpoint—for images, we’ll use https://a.mapillary.com/v3/images

With these in place, we can start our Jupyter Notebook:

import json, requests

# set our building blocks
client_id = 'ABChSnNFdGpxSEGGREUwb01FYzlXZzo4YjZkNmJjMWJlMTIzNzkz' # client ID
start_time = '2019-11-01' # string in ISO format
end_time = '2019-11-30' # string in ISO format
username = 'chrisbeddow'

# API call URL with sort_by=key which enables pagination, insert building blocks
url = ('https://a.mapillary.com/v3/images?client_id={}&usernames={}&start_time={}&end_time={}&per_page=500&sort_by=key').format(client_id,username,start_time,end_time)

# create an empty GeoJSON to collect all images we find
output = {"type":"FeatureCollection","features":[]}

with open('november_images.geojson', 'w') as outfile: #set output filename

    #print the API call, so we can click it to preview the first response
    print(url)

    # get the request with no timeout in case API is slow
    r = requests.get(url, timeout=None)

    # check if request failed, if failed, keep trying - 200 is a success
    while r.status_code != 200:
        r = requests.get(url, timeout=None)

    data = r.json() # get a JSON format of the response
    data_length = len(data['features']) # get a count of how many images 
    for feature in data['features']:
        output['features'].append(feature) # append each image to our empty geojson

    # if we receive 500 items, response was full and should be a next page
    while data_length == 500:

        # get the URL for a next page
        link = r.links['next']['url']

        # retrieve the next page in JSON format
        r = requests.get(link)

        # try again if the request fails
        while r.status_code != 200:
            r = requests.get(url, timeout=None)

        data = r.json()

        for feature in data['features']:
            output['features'].append(feature)

        print('Total images: {}'.format(len(output['features']))) # print total count
        data_length = len(data['features']) #update data length

    # send collected features to the local file
    json.dump(output, outfile)

print('DONE') # once all images are pushed to a GeoJSON and saved, we finish

The output will print:

https://a.mapillary.com/v3/images?client_id=ABChSnNFdGpxSEGGREUwb01FYzlXZzo4YjZkNmJjMWJlMTIzNzkz&usernames=chrisbeddow&start_time=2019-11-01&end_time=2019-11-30&per_page=500&sort_by=key
Total images: 1000
Total images: 1500
Total images: 2000
Total images: 2500
Total images: 3000
Total images: 3500
Total images: 4000
Total images: 4500
Total images: 5000
Total images: 5126
DONE

Once this is working, it’s easy to use the Mapillary API documentation to do other image searches, whether it’s for different dates, by searching for images close to a particular longitude or latitude, or getting all images in one area by setting a bounding box. These filters can be found in the images API documentation. This requires simply changing the URL in the script, for example:

  • Get all images near the Berlin State Opera:

https://a.mapillary.com/v3/images?client_id=ABChSnNFdGpxSEGGREUwb01FYzlXZzo4YjZkNmJjMWJlMTIzNzkz&per_page=500&sort_by=key&closeto=13.394763887407521,52.51666127073133

  • Get all images inside of Hyde Park, in London:

https://a.mapillary.com/v3/images?client_id=ABChSnNFdGpxSEGGREUwb01FYzlXZzo4YjZkNmJjMWJlMTIzNzkz&per_page=500&sort_by=key&bbox=-0.1899433135986328,51.5015033911501,-0.1522207260131836,51.51392396328955

Viewing images

Next, we’ll look at a more brief example from the images API, where we can retrieve not just the map data but an image. This brings the street-level imagery to life in the code.

Let’s say we know a package has been delivered at a set of map coordinates, -118.497867,34.019246, in the format of longitude,latitude. We want to find out what is located there. We can plug these coordinates into the Mapillary API using the lookAt parameter, which gives us images near this location with a camera angle pointed toward it. We can then get the image key, and print a JPEG of the image for a quick preview.

import json, requests
from IPython.display import Image, display # import module to print an image

# set our building blocks
client_id = 'ABChSnNFdGpxSEGGREUwb01FYzlXZzo4YjZkNmJjMWJlMTIzNzkz' # client ID
coordinates = '-118.481170,34.033302' # the coordinates of our point of interest
distance = 15 # the maximum distance in meters that our image should be from the point it looks toward

# API call URL - we just want one image, and no pagination
# we use the parameters lookat and close to, because we want an image close to our point of interest but also looking at it
url = ('https://a.mapillary.com/v3/images?client_id={}&per_page=1&lookat={}&closeto={}&radius={}').format(client_id,coordinates,coordinates,distance)

# request a JSON showing the point location and metadata of the images looking at our coordinates
resp = requests.get(url)
data = resp.json()

# there should only be one image, and we will retrieve the first image point feature's attribute called 'key'
image_key = data['features'][0]['properties']['key']

# we will use a template link that shows a JPG from Mapillary, and insert the key
image = 'https://images.mapillary.com/{}/thumb-1024.jpg'.format(image_key)

# print a link to the image and the image key
print(image)
print('image key: {}'.format(image_key))

# request the image URL, to get a displayable image
r = requests.get(image,stream=all)

# use the display module to print the retrieved image
display(Image(r.content))

Once we run this code, the output will show that our package is waiting at the Santa Monica Flower Shop.

https://images.mapillary.com/oGj6OJ7jd0TQLygl-zeWHA/thumb-1024.jpg
image key: oGj6OJ7jd0TQLygl-zeWHA

Mapillary Sequences API

Next let’s take a look at the sequences API. In this scenario, a delivery truck was capturing images with the Blackvue dashcam. We want to get a summary of what streets were captured in the area around Newmarket, Ontario, then display it in Leaflet for Jupyter Notebooks. First, our building blocks:

  • bounding box, in the format of bottom left longitude, bottom left latitude, top right longitude, top right latitude: -79.47732925415039,44.04552463793708,-79.453125,44.061994655567105
  • username: *abcourier01*
  • you’ll want to install ipyleafet and restart your jupyter notebook if you don’t have it already

Once this is ready, we can run the code:

import json, requests
from ipyleaflet import Map, GeoJSON


# set our building blocks
client_id = 'ABChSnNFdGpxSEGGREUwb01FYzlXZzo4YjZkNmJjMWJlMTIzNzkz' # client ID
bbox = '-79.47732925415039,44.04552463793708,-79.453125,44.061994655567105'
username = 'abcourier01'

# API call URL, we are not pagination this time but want to make sure to set the page size to maximum, which is 1000
url = ('https://a.mapillary.com/v3/sequences?client_id={}&usernames={}&bbox={}&per_page=1000').format(client_id,username,bbox)

# create an empty GeoJSON to collect all images we find
output = {"type":"FeatureCollection","features":[]}


with open('courier_sequences.geojson', 'w') as outfile: #set output location 

    #print the API call, so we can click it to preview the first response
    print(url)

    # get the request with no timeout in case API is slow
    r = requests.get(url, timeout=None)

    # check if request failed, if failed, keep trying - 200 is a success
    while r.status_code != 200:
        r = requests.get(url, timeout=None)

    # get a JSON format of the response
    data = r.json()

    # print total number of sequences
    print('Total sequences: {}'.format(len(data['features'])))

    # save the entire JSON response as a geojson for outside use
    json.dump(data, outfile)

# set map center to Newmarket
m = Map(center=(44.055641595944394,-79.46033477783203), zoom=14)

# load in our Mapillary sequence data in red 
routes = GeoJSON(data=data, style = {'color': 'red', 'weight':2})

# add the layer to the map
m.add_layer(routes)

# display the map
m

Running the above, the output is a link, a count of sequences, and a map:

https://a.mapillary.com/v3/sequences?client_id=ABChSnNFdGpxSEGGREUwb01FYzlXZzo4YjZkNmJjMWJlMTIzNzkz&usernames=abcourier01&bbox=-79.47732925415039,44.04552463793708,-79.453125,44.061994655567105&per_page=1000
Total sequences: 11

As a bonus, we may want to know the total distance the driver covered. We’ll import the geometry module of Shapely, and using the already existing data variable containing the API data, so we can get a sum of lengths. Keep in mind that the data contains multiple linestring features, so we can iterate through them. We also need to use the PyProj module to change the projection from EPSG 4326 (degrees) to EPSG 3857 (meters):

from shapely.geometry import Point, LineString
from pyproj import Proj, transform

# set the total distance covered to zero, and we will add to this as we analyze
distance = 0

# our coordinate references: Mapillary API data comes in EPSG:4326, but for meters we convert to EPSG:3857
inProj = Proj(init='epsg:4326')
outProj = Proj(init='epsg:3857')


# loop through all the features in the API data
for feature in data['features']:

    # create empty geometry
    geometry = []

    # for each line feature, get the complete set of linear coordinates
    feature_coords = feature['geometry']['coordinates']

    # for each coordinate pair, we will convert from 4326 to 3857
    for coord in feature_coords:

        # transform longitude and latitude each to 3857
        coord3857x, coord3857y = transform(inProj,outProj,coord[0],coord[1])

        #push the new coordinates to the geometry we created

        geometry.append([coord3857x,coord3857y])

    # create a Shapely linestring from our geometry for this particular feature
    line = LineString(geometry)

    # get the length of the line in kilometers
    length = line.length/1000

    # print the sequence key, timestamp of capture start, and length in kilometers
    print('Sequence {} at {}: {} km'.format(feature['properties']['key'],feature['properties']['captured_at'],length))

    # add the distance of this sequence to the total distance sum
    distance += length

# print the total kilometers driven when all sequences are summed up
print('total distance: {} km'.format(distance))

This one can take some time to run since it is processing each coordinate pair and reprojecting. It may be faster to try another approach, such as opening the GeoJSON response with Geopandas and changing the projection that way. Either way, the result we are looking for in this case is a small set of statistics:

Sequence skyqvtoba6bxiwwo27e6hr at 2019-10-16T20:46:23.849Z: 2.1538182246583766 km
Sequence hsysyeeflmvzkqbux2r6iy at 2019-10-15T20:09:27.829Z: 7.747716960735397 km
Sequence sfjvew2i1j3pdow3he6w34 at 2019-10-14T18:04:22.882Z: 6.3491184064007635 km
Sequence 6ou480bwnv7xh3oqbm6dfn at 2019-10-07T18:09:56.955Z: 8.77713434931526 km
Sequence tprnhbdlviab3i7ms4b99q at 2019-10-07T17:56:33.755Z: 16.41832349879677 km
Sequence tm72eldcpi5jd72uibzl75 at 2019-10-07T17:06:37.937Z: 11.007820625631481 km
Sequence ssdl978cr4mytqdlfnvvf8 at 2019-10-07T16:18:31.022Z: 8.471543704786498 km
Sequence jc0fpz82ypau93qdh21jfx at 2019-10-07T12:59:29.148Z: 11.161567571232656 km
Sequence yrg0z12hrgoo1vq6xjonc2 at 2019-09-27T20:14:07.158Z: 12.371594176362361 km
Sequence 0j5i7nybr595o5xxzf326e at 2019-09-24T17:06:11.683Z: 8.804659689166494 km
Sequence x513j05kca9wybedixrx7p at 2019-09-18T17:54:34.952Z: 9.393879048138531 km
total distance: 102.65717625522458 km

Mapillary Map Features API

As a final exercise we can look at how to grab a filtered set of map features. Remember that this requires a subscription for map data, so it doesn’t apply as widely as the image and sequence data above. In this case, you want to make sure the client ID you registered has the scope private:read enabled. You will also need a token to accompany the client ID.

Our scenario: we want to grab the point location of all crosswalks in a neighborhood, as we start planning a pedestrian safety upgrade. We don’t know where the crosswalks are located, so we’ve surveyed the area with a camera, uploaded the images to Mapillary, and now want to download and visualize the extracted crosswalk locations.

Let’s get our building blocks in order:

  • Client ID with private:read (try generating a new client ID for this)
  • Generate a token
  • API URL: https://a.mapillary.com/v3/map_features
  • The value we want to filter for: marking--discrete--crosswalk-zebra

We’ll set up the code similar to before, with the token included now, and then visualize it on a map:

import json, requests

# create our empty geojson
output = {"type":"FeatureCollection","features":[]}

# define our building blocks
bbox = '-83.14667701721191,42.34506487440754,-83.13028335571289,42.34398642608629'
values = 'marking--discrete--crosswalk-zebra' # reference values you'd like here, separate multiple by comma, leave empty quotes to get all values
client_id = 'ABChSnNFdGpxSEGGREUwb01FYzlXZzo4YjZkNmJjMWJlMTIzNzkz'
token = 'xMkJ8SMaDe6lz3HHaQuWmRvcnAANXn8PneCeJQd9d1qKIyO6t10ygBnrLdNGA5k4flc0w4yhTGieBihrN9vebqtWxFPazldBaGGirB1ONYtm0iXP0kaYcTxhKHmpbMCCXqKVUqv6u0ArDYLMigVtVJev64e3oXh7'

layers = 'points' # choose trafficsigns or points

# define the header that will be passed to the API request--it needs to include the token to authorize access to subscription data
header =  {"Authorization" : "Bearer {}".format(token)}

# build the API call
url = ('https://a.mapillary.com/v3/map_features?layers={}&sort_by=key&client_id={}&per_page=500&values={}&bbox={}').format(layers,client_id,values,bbox)

# print the URL so we can preview it
print(url)

# send the API request with the header and no timeout
r = requests.get(url,timeout=None,headers=header)

# if call fails, keeping trying until it succeeds with a 200 status code
while r.status_code != 200:
    r = requests.get(url,timeout=None,headers=header)

# get data response as a JSON and count how many features were found
data = r.json()
data_length = len(data['features'])

# print number of features
print(data_length)

# add each feature to the empty GeoJSON we created
for f in data['features']:
    output['features'].append(f)

# loop through each new page and continue adding the results to the empty GeoJSON
while data_length == 500:
    link = r.links['next']['url']
    r = requests.get(link,timeout=None,headers=header)
    while r.status_code != 200:
        r = requests.get(url,timeout=None,headers=header)
    data = r.json()
    for f in data['features']:
        output['features'].append(f)

    # print total number of features found so far
    print("Total features: {}".format(len(output['features'])))

    # update length of data in last call to see if it still remains at 500 (maximum) indicating a next page
    data_length = len(data['features'])

with open('detroit_utility_poles.geojson', 'w') as outfile:
    print('DONE')
    json.dump(output, outfile)

We see the results give us a count of the first API call, and checks for a next page, finding only a small amount of features. It then prints the final sum:

https://a.mapillary.com/v3/map_features?layers=points&sort_by=key&client_id=ABChSnNFdGpxSEGGREUwb01FYzlXZzo4YjZkNmJjMWJlMTIzNzkz&per_page=500&values=object--support--utility-pole&bbox=-83.14667701721191,42.34506487440754,-83.13028335571289,42.34398642608629
500
Total features: 524
DONE

As the final step, let’s import the Leaflet module and plot the data:

from ipyleaflet import Map, GeoJSON

# set map center to Detroit
m = Map(center=(42.34484284244194,-83.13680648803711), zoom=16)

# load in our Mapillary sequence data in green
crosswalks = GeoJSON(data=data, style = {'color': 'red', 'weight':5})

# add the layer to the map
m.add_layer(crosswalks)

# display the map
m

This then prints our map:

Overall, working with Jupyter Notebooks, Python, and Mapillary is a great way to quickly grab images and data, apply filters, visualize the locations, and export the data in the format you need. The examples in this blog post are introductory and meant to serve as templates, and I hope you’ll take these to an advanced level and get valuable use of the Mapillary data and APIs for your own projects.

If you have questions about specific use cases, interest in a data subscription, or any other feedback, don’t hesitate to reach out to us!

/Chris, Solutions Engineer

Continue the conversation