The new NVIDIA Tesla V100 graphics processing units and TensorRT 3.0 library together with Amazon EC2 P3 instances make Mapillary’s semantic segmentation models 27 times faster while using 81% less memory.
Advanced algorithms for semantic segmentation demand a lot of computation and memory resources, especially when applied to high-resolution image data. That means it can be quite costly to run these recognition models in large-scale production environments like Mapillary, where hundreds of thousands of images need to be segmented every day.
NVIDIA recently launched the new NVIDIA Tesla V100 graphics processing units, based on the latest NVIDIA Volta GPU architecture designed to handle such demanding tasks. These cards are now available on Amazon EC2 server instances—Mapillary's production environment.
As of recent, Amazon AWS made new EC2 P3 instances available that are powered by up to 8 NVIDIA Tesla V100 GPUs and particularly meant for computation-heavy applications such as machine learning. To further decrease runtime and memory requirements during inference (i.e. when the algorithm is classifying previously unseen data based on what it has learnt during training), NVIDIA recently published a release candidate of the NVIDIA TensorRT 3.0 library that aims to optimize performance on Volta GPUs.
We put this setup to the test to see how much we could reduce the runtime and memory needs of Mapillary’s semantic segmentation models.
Mapillary uses two semantic segmentation approaches in production, each addressing different needs. The Mapillary Standard Segmentation is tuned to run cost-efficiently while accepting some loss in accuracy. This model is applied to every image in Mapillary’s database (currently 226 million images). The Mapillary HD Segmentation model is set to maximize accuracy at the expense of longer runtimes and higher memory demands. It is used for selected images when very high accuracy or recognition of delicate objects is needed.
The baseline in this test was our existing semantic segmentation approach which is built on top of Caffe and deployed on Amazon EC2 P2 instances with NVIDIA Tesla K80 GPUs. The test setup used NVIDIA Tesla V100 and TensorRT 3.0 deployed on Amazon EC2 P3 instances.
Our experiments show that the setup on Amazon EC2 P3 with the latest generation of NVIDIA Volta powered GPUs and TensorRT 3.0 speeds up our semantic segmentation algorithms by up to 27 times while reducing memory requirements by 81%.
Breaking it down, we were speeding up the HD segmentation 27 times and the Standard one 18 times, with the corresponding memory reductions of 81% and 74%. The gradual speed improvements can be seen in the top figures, while the bottom figures show the detailed breakdown of the memory reductions. (It should be noted that using Caffe as a baseline yields higher speedups and memory reduction compared to more advanced frameworks like PyTorch or TensorFlow.)
One limitation of the current TensorRT version is the need to specify the input data shape at build time. Since images at Mapillary have varying aspect ratios and sizes, we could process them more efficiently if we could reshape the width and height of the input layer after we initialize the network. As we’re really impressed with the significant speedup and memory reduction from using NVIDIA Tesla V100 on Amazon EC2 P3 instances together with TensorRT 3.0, we hope this to be available in upcoming TensorRT releases.
P.S. Take a look at the Mapillary feature on the NVIDIA blog as well.