The Video Mesh: A Data Structure for Image-based Three-dimensional Video Editing

Jiawen Chen1 Sylvain Paris2 Jue Wang2 Wojciech Matusik1 Michael Cohen3 Frédo Durand1

1MIT CSAIL 2Adobe Systems, Inc. 3Microsoft Research

Teaser

The video mesh data structure represents the structural information in an input video as a set of deforming texture mapped triangles augmented with mattes. The mesh has a topology that resembles a "paper cutout." This representation enables a number of special effects applications such as depth of field manipulation, object insertion, and change of 3D viewpoint.

Paper: PDF (ICCP 2011)
Supplemental material: PDF
Video (high bitrate): MP4 (67MB, 640x480p24, contains audio)
Video (low bitrate): MP4 (29MB, 640x480p24, contains audio)
Anaglyphic (red/cyan) results: AVI (Lagarith lossless codec, 246 MB, 640x480p24, no audio)
Stills of anaglyphic results: PNGs (best viewed full screen on a 24" (61 cm) monitor at a distance of 30" (76 cm)
Slides from ICCP 2010: PPTX (PowerPoint 2010, WMV videos embedded, 81 MB)

Abstract

This paper introduces the video mesh, a data structure for representing video as 2.5D "paper cutouts." The video mesh allows interactive editing of moving objects and modeling of depth, which enables 3D effects and post-exposure camera control. The video mesh sparsely encodes optical flow as well as depth, and handles occlusion using local layering and alpha mattes. Motion is described by a sparse set of points tracked over time. Each point also stores a depth value. The video mesh is a triangulation over this point set and per-pixel information is obtained by interpolation. The user rotoscopes occluding contours and we introduce an algorithm to cut the video mesh along them. Object boundaries are refined with per-pixel alpha values. The video mesh is at its core a set of texture mapped triangles, we leverage graphics hardware to enable interactive editing and rendering of a variety of effects. We demonstrate the effectiveness of our representation with special effects such as 3D viewpoint changes, object insertion, depth-offield manipulation, and 2D to 3D video conversion.