The Video Mesh: A Data Structure for Image-based Three-dimensional Video Editing

Jiawen Chen¹ Sylvain Paris² Jue Wang² Wojciech Matusik¹ Michael Cohen³ Frédo Durand¹

¹MIT CSAIL ²Adobe Systems, Inc. ³Microsoft Research

The video mesh data structure represents the structural information in an input video as a set of deforming texture mapped triangles augmented with mattes. The mesh has a topology that resembles a "paper cutout." This representation enables a number of special effects applications such as depth of field manipulation, object insertion, and change of 3D viewpoint.

Paper: PDF (ICCP 2011)
Supplemental material: PDF
Video (high bitrate): MP4 (67MB, 640x480p24, contains audio)
Video (low bitrate): MP4 (29MB, 640x480p24, contains audio)
Anaglyphic (red/cyan) results: AVI (Lagarith lossless codec, 246 MB, 640x480p24, no audio)
Stills of anaglyphic results: PNGs (best viewed full screen on a 24" (61 cm) monitor at a distance of 30" (76 cm)
Slides from ICCP 2010: PPTX (PowerPoint 2010, WMV videos embedded, 81 MB)

Abstract

This paper introduces the video mesh, a data structure for representing video as 2.5D "paper cutouts." The video mesh allows interactive editing of moving objects and modeling of depth, which enables 3D effects and post-exposure camera control. The video mesh sparsely encodes optical flow as well as depth, and handles occlusion using local layering and alpha mattes. Motion is described by a sparse set of points tracked over time. Each point also stores a depth value. The video mesh is a triangulation over this point set and per-pixel information is obtained by interpolation. The user rotoscopes occluding contours and we introduce an algorithm to cut the video mesh along them. Object boundaries are refined with per-pixel alpha values. The video mesh is at its core a set of texture mapped triangles, we leverage graphics hardware to enable interactive editing and rendering of a variety of effects. We demonstrate the effectiveness of our representation with special effects such as 3D viewpoint changes, object insertion, depth-offield manipulation, and 2D to 3D video conversion.