Multimodal Interaction With Speech & Gestures


Principal Investigator:

David Demirdjian 


multi-modal studio: Demo1

virtual navigation: Demo1


body tracking: Demo1 Demo2


The main objective of this work is to design a Perceptual User Interface that provides:

·        the pose of a user (location and orientation of arms, head)

·        the detection of gestures


Pose and gesture information are then combined with speech recognition to interact with an application, e.g. virtual world navigation and interaction, video game, interaction with an avatar. 



This work consists in segmenting and tracking in real-time the gestures of a user observed by a stereo camera. More precisely, a 3D model of a person is compared to the set of 3D points reconstructed by stereo and updated using a technique similar to ICP (Iterative Closest Point). The designed technique is fast (15Hz / Pentium IV) and robust to illumination conditions.



D. Demirdjian, L. Taycher, G. Shakhnarovich, K. Grauman and T. Darrell. Avoiding the ``Streetlight Effect'': Tracking by Exploring Likelihood Modes. In ICCV’05. [PDF].

D. Demirdjian, T. Ko and T. Darrell. Untethered Gesture Acquisition and Recognition for Virtual World Manipulation. In Virtual Reality (to appear). [PDF]

D. Demirdjian and T. Darrell. 3-D Articulated Pose Tracking for Untethered Diectic Reference. In Proceedings of ICMI’02, October 2002, Pittsburgh, Pennsylvania. [PDF]