Person Tracking with Stereo Range Sensors
Principal Investigators:
Neal Checka
David Demirdjian


The Problem:
The goal of this project is to design and build a multi-person tracking system using a network of stereo cameras. Our system, which stands in an ordinary conference room is able to track people, estimate their trajectories as well as different characteristics (e.g. size, posture).
Systems which can track and understand people have a wide variety of commercial applications. It is predicted that computers of the future will interact more naturally with humans than they do now. Instead of the desktop computer paradigm with humans communicating by typing, computers of the future will be able to understand human speech and movements. Our system demonstrates the capabilities of a solely vision-based system for these ends.
Our Approach:
We have developed a system that can perform dense, fast range-based tracking with modest computational complexity. We apply ordered disparity search techniques to prune most of the disparity search computation during foreground detection and disparity estimation, yielding a fast, illumination-insensitive 3D tracking system. When tracking people, we have found that rendering an orthographic vertical projection of detected foreground pixels is a useful model.

A ”plan view” image facilitates correspondence in time since only 2D search is required. Previous systems would segment foreground data into regions prior to projecting into a plan-view, followed by region-level tracking and integration, potentially leading tosub-optimal segmentation and/or object fragmentation. Instead, we develop a technique that altogether avoids any early segmentation of foreground data. We merge the plan-view images from each view and estimate over time a set of trajectories that best represents the integrated foreground density. The figure on the left shows three people are standing in a room, though not all are visible to each camera. Foreground points are projected onto a ground plane. Ground plane points from all cameras are then superimposed into a single data set before clustering the points to find person locations.

Detecting locations of users in a room using multiple views and plan-view integration.
Trajectory estimation is performed using a...
Future Work:
1. Trevor Darrell, David Demirdjian, Neal Checka, Pedro Felzenswalb, Plan-view Trajectory Estimation with Dense Stereo Background Models, Proceedings of the International Conference on Computer Vision, 2001
Demo 1 (9.85 MB): Shows server module detecting locations of people in a room using multiple views. The posture of a person is color coded (sitting = red, standing = green). An active camera tracks the tallest person in the room.