1. Reconstructing time-varying scenes in 4D from Internet photos. Our work on "Scene Chronology" showed how to take hundreds of thousands or millions of photos of a time-varying place like Times Square, and automatically create a 4D model---a 3D model with a time slider that captures the changes in the scene geometry and appearance across time. This work was based on a new scalable and robust algorithm for taking photos taken at different times and places, with noisy or missing timestamps, and inferring both the 3D structure and the time span during which each object in the scene existed. The resulting 4D model can also be used to approximately timestamp new photos by using constraints based on which objects are visible in the image. This work appeared at ECCV 2014 and received the best paper award.
2. Estimating scene reflectance and illumination from large-scale image collections. We have developed new methods for automatically decomposing scenes into intrinsic reflectance and per-image illumination maps from large, uncalibrated Internet photo collections. Our new innovation is to use the statistics of outdooor illumination, as predicted from computer graphics models of sun and sky illumination, and connect these statistics with observations of the scene made across thousands of photos taken at unknown times. A key innovation is to model the effect on illumination of the environment being occluded at each point (known in computer graphics as "ambient occlusion"). Ours is one of the first algorithms in computer vision for explicitly modeling ambient occlusion.
3. Algorithms and datasets for automatically "grounding" photos in the world, by geo-tagging and time-stamping them. We have created new ways to automatically tell where and when a photo was taken. Our geo-tagging method is based on matching photos to a large, world-wide 3D model of places around the globe created using structure from motion techniques (a "world-wide point cloud" containing hundreds of millions of points with appearance descriptors). To make this approach scale, we have developed new methods for creating structure-from-motion models at scale for challenging scenes around the world (many of which have repeated structures that confuse normal reconstruction algorithms), as well as new methods for compressing 3D models while retaining as much useful information as possible. Using the resulting compressed world-wide point cloud, we very quickly match an image and estimate its pose to precisely geolocate it.
We have also developed ways to use our models of scene illumination and appearance over time to automatically timestamp photos. Using transient scene elements we can approximately date a photo, and using illumination (i.e., sun direction) we can compute the time it was taken.
Supported by the National Science Foundation |