Context-based Visual Feedback Recognition
 
Principal Investigators:
Goal:
Head pose and gesture offer several conversational grounding cues and are used extensively in
face-to-face interaction among people. To recognize visual feedback efficiently, humans often use contextual knowledge from previous and current events to anticipate when feedback is most likely to occur. In this research we describe how contextual information can be used to predict visual feedback and improve recognition of head gestures in human-computer interfaces. Lexical, prosodic, timing, and gesture
features can be used to predict a user's visual feedback during conversational dialog with a robotic or virtual agent. In non-conversational interfaces, context features based on user-interface system events can improve detection of head gestures for dialog box confirmation or document browsing. Our user study
with prototype gesture-based components indicate quantitative and qualitative benefits of gesture-based confirmation over conventional alternatives. Using a discriminative approach to contextual prediction and multi-modal integration, performance of head gesture detection was improved with context features even when the topic of the test set was significantly different than the training set.
Our Approach:
When interacting with a computer in a conversational setting, dialog state can provide useful context for recognition. In the last decade, many embodied conversational agents (ECAs) have been developed for face-to-face interaction, using both physical robots and virtual avatars. A key component of these systems is the dialogue manager, usually consisting of a history of the past events, current discourse moves, and an agenda of future actions. The dialogue manager uses contextual information to decide which verbal or nonverbal action the agent should perform next (i.e., context-based synthesis). Contextual information has proven useful for aiding speech recognition: in related work, a speech recognizer's grammar changes dynamically depending on the agent's previous action or utterance. In a similar fashion, we have
developed a context-based visual recognition module that builds upon the contextual information available in the dialogue manager to improve performance of visual feedback recognition (see Figure below).

Figure 1: Contextual recognition of head gestures during face-to-face interaction with a conversational robot. In this scenario, contextual information from the robot's spoken utterance helps disambiguating the listener's visual gesture.

Related Publications:
  1. Louis-Philippe Morency, Candace Sidner, Christopher Lee, Trevor Darrell, Contextual Recognition of Head Gestures, Proceedings of the International Conference on Multimodal Interactions, 2005.
  2. L.-P. Morency, C. Sidner, C. Lee, and T. Darrell, Head Gestures for Perceptual Interfaces: The Role of Context in Improving Recognition. Artificial Intelligence. Elsevier, accepted for publication, 2006