Visualizing the 3D World: A Practical Approach to Polygon Triangulation on the Sphere
At Mapillary, we work to extract as much information as possible from the images contributed to the database. One way of doing that is through semantic segmentation, where we define and label different regions in the image based on the real-life objects they depict. The resulting object detections, in turn, can be used to generate point features that are placed on the map.
To visualize semantic segmentation in the Mapillary viewer (as well as the annotations in our street-level imagery dataset Mapillary Vistas), we use the MapillaryJS library. In MapillaryJS, we render the segmentation polygons and fill them with colors with the help of polygon triangulation.
Note that at Mapillary, we often talk about triangulation related to positioning map features on the map. But polygon triangulation is a different thing. It's a technique in computational geometry that helps create simplified 3D models (meshes) of objects and is often used in geometric modeling and computer graphics. And at Mapillary, as mentioned, we use it to help visualize semantic segmentation in the 3D representation of the world that you see through the Mapillary viewer.
While it's pretty easy to create meshes of shapes on a plane in two dimensions, such as in regular images, it's considerably more complex for spheres—which we use to render panoramic images. And that is a challenge we at Mapillary need to tackle, as there is a considerable amount of 360° imagery on the platform. So let's take a look at how we've worked out a solution.
Start with the vocabulary
Before explaining the challenging aspects of triangulation on the sphere, the topic for this post, let's first define a vocabulary:
- Real 3D world—the space that we live in and capture with cameras when mapping.
- Distorted 2D projection—the projection of the captured images. This is a two-dimensional space, the flat image. Projection types for images uploaded to Mapillary are generally perspective, fisheye, or equirectangular. All projection types have some distortion. For perspective and fisheye images it's radial. For equirectangular images the distortion comes from the representation itself. In this blog post we focus on the equirectangular projection.
- Undistorted 3D space—the rendered space (in MapillaryJS) where textures are undistorted according to some calibration parameters. This space is three-dimensional and its aim is to represent the real 3D world as accurately as possible. In this space, equirectangular (panoramic) images are rendered as spheres.
The image segment visualizations in MapillaryJS are filled with different colors based on the segmentation class. The fill originates from a colored 3D mesh that is placed in front of the image, or inside the image sphere in the case of a panorama, in the undistorted 3D space in the MapillaryJS viewer.
While it's fairly straightforward to create this mesh through triangulation for regular images, it's more complicated for equirectangular 360° panoramas because of their spherical nature. The relations between the polygon vertices (the corners of polygons) change when undistorting from the equirectangular projection.
These relational changes can lead to faulty triangles if the triangulation is performed on the original distorted 2D projection. Triangles can, for example, appear outside the actual polygon outline when unprojected from the distorted 2D projection to the undistorted 3D space. So the polygons can't be triangulated directly on the distorted 2D projection. Also, the triangulation needs to be performed in 2D. Therefore, it can't be done directly in 3D on the image sphere, so we need another method.
Besides solving the actual triangulation, we also need a method that is performant because we may need to triangulate hundreds of polygons with thousands of vertices for a single image. We need to perform the triangulation while keeping the MapillaryJS viewer as responsive as possible.
An example of what we want to achieve can be seen in the figures below. A simple hexgonal polygon has been segmented on an equirectangular 360° panorama. We want to be able to render it filled with a color in the undistorted 3D space.
The relative postions of the polygon vertices that we want to triangulate are those on the sphere. While we can't triangulate directly on the spherical 3D coordinates, we can perspectively project these 3D coordinates to a 2D plane. Then we can triangulate in 2D.
It is not possible to project all the points on the sphere to a single plane, some points will end up behind the plane.
Therefore, we have to divide the distorted 2D projecton into a number of subareas that are small enough so that all points will end up in front of a chosen plane. Let us divide the image into a grid that ensures that no subarea covers more than 180 degrees. If we choose a grid of 2 x 3 rectangular subareas, no subarea covers more that 120 degrees on the sphere. After dividing the image, we can divide the triangulation problem into six subproblems by clipping the polygon in each subarea. In our case, we get three clipped polygon parts related to the subareas containing the polygon.
After clipping the polygon, we can unproject it to the sphere and then immediately project it to a plane with the normal in the direction of the center of the grid rectangle to ensure that all points are in front of the plane.
Once projected, we can now triangulate the clipped polygon and then fill the triangles with color.
If we now assemble all the triangles from the different subareas, we have our completed triangulation. We can render the polygon with fill in undistorted 3D space.
Below is a simplified pseudo-algorithm for triangulating a polygon on the sphere:
1. Divide the original image into x times y rectangular subareas where x >= 3 and y >= 2 to ensure that a subarea covers at most an angle of 120 degrees on the sphere. 2. Create an empty 3D coordinate triangles array. 3. For each subarea: 3.1. Clip the polygon according to the subarea boundaries using the distorted 2D coordinates. 3.2. Unproject the distorted 2D coordinates of the polygon to undistorted 3D coordinates. 3.3. Project the the undistorted 3D coordinates to a plane in front of a camera with principal ray going through the center of the subarea. 3.4. Triangulate the projected 2D coordinates. 3.5. Add the undistorted 3D coordinates corresponding to the triangle indices to the triangles array. 4. Use the assembled 3D coordinate triangle array.
The end result can be seen in the embedded viewer below.
Visualizing the vast information we extract from the images is essential to make use of, evaluate, and understand it. With our practical and performant approach to triangulate polygons defined on the sphere, we can now visualize segmentations detected on equirectangular panoramas in a better way by rendering polygons on the sphere with fill. We do it by dividing the spherical triangulation problem into smaller subproblems that are simpler to solve. Then we combine the results from each subproblem into the final solution.
/Oscar, Computer Vision Developer