Natural Tasking of Robots Based on Human Interaction Cues

MIT Computer Science and Artificial Intelligence Laboratory
The Stata Center
32 Vassar Street
Cambridge, MA 02139
USA

PI: Rodney A. Brooks

Cog turns a crank
M4 robot head drawing
Kismet plays with a frog
Coco the gorilla robot

1999 - 2000 Achieved Deliverables

Kismet

Flexible Turn-Taking based on eye contact and head motion

We have demonstrated robust and flexible vocal turn-taking on our robot, Kismet. Kismet can engage in a proto-dialog with a single person as well as with two people. Kismet determines when it should take its turn based on pauses in speech and the current phase of the turn-taking interaction. Through experiments with naive subjects, we have found that people intuitively read the robot's physical and vocal cues (change of gaze direction, shifts of posture, and pauses in vocalizations) and naturally use these cues to time their own response. As a result, the proto-dialog becomes smoother over time, with fewer accidental interruptions or pauses.

Watch it in action (Quicktime movie):

In this video clip, Kismet engages people in a proto-dialog. The robot does not speak any language; it babbles so don't expect to understand what it's saying. The turn taking dynamics are quite fluid and natural. The robot sends a variety of turn-taking cues through vocal pauses, gaze direction and postural changes. The first segment is with two of Kismet's caregivers. The next two are with naive subjects. The last is edited from one long interaction. [9.3MB]

For more information see:

Cynthia Breazeal. "Sociable Machines: Expressive Social Exchange Between Humans and Robots". Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, PhD Thesis, May 2000. [SEE CHAPTER 9] [6.5MB] [20.7MB compressed]

Detect Prosody in human speech and show appropriate facial responses

We have demonstrated a robust technique for recognizing affective intent in robot-directed speech. By analyzing the prosody of a person's speech, Kismet can determine whether it was praised, prohibited, soothed, or given an attentional bid. The robot can distinguish these affective intents from neutral robot-directed speech. The output of the recognizer modulates the robot's emotional models, inducing an appropriate affective state with a corresponding facial expression (an expression of happiness when praised, sorrow when prohibited, interest when alerted, and a relaxed expression for soothing). In multi-lingual experiments with naive female subjects, we found that the robot was able to robustly classify the four affective intents. In addition, the subjects intuitively inferred when their intent had been properly understood by Kismet's expressive feedback.

Watch it in action (Quicktime movie):

In this video clip, Kismet correctly interprets 4 classes of affective intent: praise, prohibition, attentional bids, and soothing. These were taken from cross-lingual studies with naive subjects. The robot's expressive feedback is readily interpreted by the subjects as well. [5.3MB]

For more information see:

Cynthia Breazeal and Lijin Aryananda. "Recognition of Affective Communicative Intent in Robot-Directed Speech". submitted to the IEEE-RAS International Conference on Humanoid Robots 2000.

Cynthia Breazeal. "Sociable Machines: Expressive Social Exchange Between Humans and Robots". Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, PhD Thesis, May 2000. [SEE CHAPTER 7] [6.5MB] [20.7MB compressed]

Expressive feedback through face, voice, and body posture

We have implemented expressive feedback in multiple modalities on Kismet. The robot is able to express itself through voice, facial expression, and body posture. We have evaluated the readability of Kismet's expressions for anger, disgust, fear, happiness, interest, sorrow, surprise, and some interesting blends through numerous studies with naive human subjects.

Watch it in action (Quicktime movie):

In this video clip, Kismet says the phrase "Do you really think so" with varying emotional qualities. In order, the emotional qualities correspond to calm, anger, disgust, fear, happy, sad, interest. [3.6MB]

For more information see:

Cynthia Breazeal and Brian Scassellati. "How to Build Robots That Make Friends and Influence People". to appear in IROS99, Kyonjiu, Korea, 1999.

Cynthia Breazeal. "Sociable Machines: Expressive Social Exchange Between Humans and Robots". Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, PhD Thesis, May 2000. [SEE CHAPTERS 11 AND 12] [6.5MB] [20.7MB compressed]

Visual attention and gaze direction

We have implemented a visual attention system on Cog and Kismet based on Jeremy Wolfe's model of human visual search. We have tested the robustness of the attention system on these robots. By matching the robot's visual system to what humans find to be inherently salient, the robot's attention is often drawn to the same sorts of stimuli that humans do. In studies with naive subjects, we found that people intuitively use natural attention-grabbing cues to quickly direct the robot's attention (motion, proximity, etc.). The subject's intuitively use the robot's gaze and smooth pursuit behavior to determine when they have successfully directed the robot's attention.

Watch it in action (Quicktime movies):

In this video clip, Kismet is searching for a toy. It's facial expression and eye movements make it readily appearant to an observer when the robot has discovered the colorful block on the stool. The attention system is always running and enabling the robot to respond appropriately to unexpected stimuli (such as the person entering from the right hand side of the frame to take away the toy). Notice how Kismet appears a bit crest-fallen when it's toy is removed. [3.9MB]

These three movies show the visual attention system in action:

This clip illustrates the color saliency process. The left frame is the video signal, the right frame shows how the colorful block is particularly salient. The middle frame shows the raw saliency value due to color. The bright region in the center is the habituation influence. [6.8MB]

This clip illustrates the motion saliency process. [5.9MB]

This clip illustrates the face saliency process (center) and the habituation process (right). [14.3MB]

For more information see:

Cynthia Breazeal and Brian Scassellati. "A Context-dependent Attention System for a Social Robot". In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI99), Stockholm, Sweden, pp.1146-1151, 1999.

Cynthia Breazeal. "Sociable Machines: Expressive Social Exchange Between Humans and Robots". Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, PhD Thesis, May 2000. [SEE CHAPTERS 6 AND 13] [6.5MB] [20.7MB compressed]

Cynthia Breazeal, Aaron Edsinger, Paul Fitzpatrick and Brian Scassellati. "Social Constraints on Animate Vision". submitted to the IEEE-RAS International Conference on Humanoid Robots 2000.

[Project Overview], [Approach], [Research Questions], [Achieved Deliverables], [Future Deliverables], [People], [Publications]

annika