Object Classification Within and Across Scenes

Automated visual perception of the real world by computers involves classification of observed physical objects into semantically meaningful categories (such as `car', `person', `truck' and so on). One of the requirements of such a classification system is the ability to perform well under a wide range of scene conditions, such as varying position, orientation and motion of objects with respect to the camera and different illumination conditions. From a pattern recognition perspective, the task involves assigning the most likely label to each object based on a set of observed features. Optimal classification performance can be achieved if the object features used are distributed identically in training and test data. Identifying such a set of features is not easy, since the distribution of many (intuitively useful) object features such as size and aspect-ratio is scene-dependent, and only a limited amount of training data is available.

We present an object classification framework that initially uses scene-invariant features (to allow for good performance in any scene) but provides a mechanism for later using scene-specific features and adapting a classifier to a given scene-specific distribution with the help of unlabelled data in that scene. To this end, we perform feature selection using mutual information estimates between object features and class labels. The context in which we implement our framework is the classification of moving objects---mostly vehicles and pedestrians---that are detected and tracked in far-field video sequences captured by static, uncalibrated cameras. Special conditions associated with our chosen setting include the low-resolution of image data obtained for each object, and the need for combining evidence from multiple instances of each object observed while tracking. Using a simple yet exhaustive feature set involving spatial moments and temporal derivatives, we identify and interpret the scene-invariant and scene-specific features associated with the classification task. We demonstrate the improvement in classification performance through scene transfer and adaptation using a modified support vector machine classifier. Experimental results are demonstrated in the context of outdoor visual surveillance of a wide variety of scenes.

Related publication: Biswajit Bose, "Classification of Tracked Objects in Far-Field Video Surveillance", Masters thesis, Massachusetts Institute of Technology, Cambridge, MA, 2004.