New Optical Flow Records using Mapillary’s Five Elements of Flow

In our latest work we reveal five key techniques for improving optical flow prediction — the task of estimating apparent 2D motion of every pixel in two consecutive images from a video. Our findings are the result of carefully analyzing shortcomings in existing works and thus help to improve a wide range of them. We quantitatively and qualitatively surpass the performance of directly comparable works and set new records on challenging optical flow benchmarks.

Optical flow is a powerful tool in computer vision. Given two consecutive images from a video, it provides a dense, per-pixel 2D motion vector describing how to transform all objects and the scene from one image into the second one. This is related but complementary to Structure-from-Motion (SfM), one of Mapillary’s key technologies for generating 3D models from multiple images sharing similar viewpoints of a scene. The main advantages of optical flow over SfM are that it does not require moving cameras, that it can handle moving objects in a scene, and that it derives matchings for each pixel rather than just for a few of them.

Optical Flow prediction
Illustration of our Optical Flow prediction results on the popular Sintel dataset. The colors encode direction and length of the flow predicted for each pixel, e.g. green means flow towards bottom left and magenta corresponds to flow towards top right.

High-quality optical flow predictions yield explanations for dynamics in a scene and hence improves the way we can generate map objects from multiple images. With our latest work called Five Elements of Flow, we address a number of shortcomings from previous works, summarizing some of them next:

  • We question the popular technique of using pyramidal data representations, which facilitate capturing large flow vectors of objects with wide displacements. While such coarse-to-fine techniques can make wide flow estimation tractable in the first place, it is difficult to overcome erroneous decisions if taken at the initial, coarse levels.
  • We include ideas from data distillation to overcome the problem of catastrophic forgetting in deep learning for optical flow estimation.
  • We show how to preserve knowledge from previous training rounds in combination with flow consistency checks generated from our network, and how this improves our results both, qualitatively and quantitatively.

This video shows predictions on KITTI Raw and Sintel when integrating all our Five Elements of Flow, highlighting some of the details in direct comparison to the baseline.

While discussing all our discovered Elements of Flow in depth goes beyond the scope of this blog post, we rather highlight that our findings improve a number of different optical flow methods and also recent depth-from-stereo approaches. Our findings are conceptually simple and easy to implement, yet result in compelling improvements that we demonstrate in extensive ablation studies and virtually all relevant optical flow benchmark datasets. Finally, our video shows results obtained on two challenging datasets, highlighting the superior behavior in direct comparison with the baseline we built upon.

/Peter, Director of Research

Continue the conversation