|
|
Context-based Visual Feedback Recognition
|
Principal
Investigators: |
|
Goal: |
Head pose and gesture offer several
conversational grounding cues and are used
extensively in
face-to-face interaction among people. To recognize
visual feedback efficiently, humans often use
contextual knowledge from previous and current
events to anticipate when feedback is most likely to
occur. In this research we describe how contextual
information can be used to predict visual feedback
and improve recognition of head gestures in
human-computer interfaces. Lexical, prosodic,
timing, and gesture
features can be used to predict a user's visual
feedback during conversational dialog with a robotic
or virtual agent. In non-conversational interfaces,
context features based on user-interface system
events can improve detection of head gestures for
dialog box confirmation or document browsing. Our
user study
with prototype gesture-based components indicate
quantitative and qualitative benefits of
gesture-based confirmation over conventional
alternatives. Using a discriminative approach to
contextual prediction and multi-modal integration,
performance of head gesture detection was improved
with context features even when the topic of the
test set was significantly different than the
training set. |
|
Our
Approach: |
When interacting with a computer in a
conversational setting, dialog state can provide
useful context for recognition. In the last decade,
many embodied conversational agents (ECAs) have been
developed for face-to-face interaction, using both
physical robots and virtual avatars. A key component
of these systems is the dialogue manager, usually
consisting of a history of the past events, current
discourse moves, and an agenda of future actions.
The dialogue manager uses contextual information to
decide which verbal or nonverbal action the agent
should perform next (i.e., context-based synthesis).
Contextual information has proven useful for aiding
speech recognition: in related work, a speech
recognizer's grammar changes dynamically depending
on the agent's previous action or utterance. In a
similar fashion, we have
developed a context-based visual recognition module
that builds upon the contextual information
available in the dialogue manager to improve
performance of visual feedback recognition (see
Figure below).
Figure 1: Contextual recognition of head gestures
during face-to-face interaction with a
conversational robot. In this scenario, contextual
information from the robot's spoken utterance helps
disambiguating the listener's visual gesture. |
|
Related
Publications: |
-
Louis-Philippe Morency, Candace Sidner,
Christopher Lee, Trevor Darrell,
Contextual Recognition of Head Gestures,
Proceedings of the International Conference on
Multimodal Interactions, 2005.
- L.-P.
Morency, C. Sidner, C. Lee, and T. Darrell,
Head Gestures for
Perceptual Interfaces: The Role of Context in
Improving Recognition. Artificial
Intelligence. Elsevier, accepted for
publication, 2006
|
|
|
|
|
|
|
|