Multi-Sensory Perceptive Systems: Human and Machine Processing of Multi-Modal Data

John Fisher (MIT AI lab) - Ladan Shams (Cal Tech) - Virginia de Sa (UCSD)
Malcolm Slaney (IBM Almaden) - Trevor Darrell (MIT AI lab)

Saturday
December 8th

Description

Schedule/Format

Presentation Abstracts

Organizers &
Participants

Bibliography

Demos &
Related Links

Internal

Scope

All perception is multi-sensory perception. Situations where animals are exposed to information from a single modality exist only in experimental settings in the laboratory. For a variety of reasons, research on perception has focused on processing within one sensory modality. Consequently, the state of knowledge about multi-sensory fusion in mammals is largely at the level of phenomenology, and the underlying mechanisms and principles are poorly understood. Recently, however, there has been a surge of interest in this topic, and this field is emerging as one of fast growing areas of research in perception.

Simultaneously and with the advent of low-cost, low-power multi-media sensors there has been renewed interest in automated multi-modal data processing. Whether it be in an intelligent room environment, heterogenous sensor array or the autonomous robot, robust integrated processing of multiple modalities has the potential to solve perception problems more efficiently by leveraging complementary sensor information.

Goals

The goals of this workshop are to further the understanding of the both the cognitive mechanisms by which humans (and other animals) integrate multi-modal data as well as the means by which automated systems may similarly function. It is not our contention that one should follow the other. It is our contention, that researchers in these different communities stand to gain much through interaction with each other. This workshop aims to bring these researchers together to compare methods and performance and to develop a common understanding of the underlying principles which might be used to analyze both human and machine perception of multi-modal data. Discussions and presentations will span theory, application, as well as relevant aspects of animal/machine perception.

Topical Issues

The workshop will emphasize a moderated discussion format with short presentations prefacing each of the discussions. The presentations and discussions would be organized around the following related questions:

Automated Systems

Human (Animal) Perception

How should one model joint audio video properties (e.g. statistically)?
At what level (and why) should one fuse joint audio/video measurements (e.g. at the signal, feature, or decision level)?
How should one deal with high dimensionality?
How are fusion methods impacted by co-located sensors (human eyes and ears) vs. distributed sensors (intelligent rooms)?
Do the joint statistical models being used predict (or explicitly model) any of the known psychophysical phenomenon (e.g. McGurk effect, ventriloquism effect, head reflectance transfer function)?

What is a good phenomenological general characterization of the direction of crossmodal interactions? i.e., what factors determine the dominance of one modality over the others in a given situation?
At what level crossmodal interactions occur? Early sensory, late sensory, association sensory, cognitive, all of the above?
Why have multisensory integration? Enhancing perceptual resolution, perceptual learning, ecological validity, some other advantage?
How can the gap between the neural data on polysensory neurons, and systems data like imaging, modeling, and psychophysical data be closed or narrowed? What kind of studies will be needed?
What kind of theoretical framework seems most suitable for accounting for multisensory integration?

gregory@ai.mit.edu