Natural Tasking of Robots Based on Human Interaction Cues

MIT Computer Science and Artificial Intelligence Laboratory
The Stata Center
32 Vassar Street
Cambridge, MA 02139
USA

PI: Rodney A. Brooks


DARPA logo

[Project Overview], [Approach], [Research Questions], [Achieved Deliverables], [Future Deliverables], [People], [Publications]


Video Clips
Cog turns a crank
M4 robot head drawing
Kismet plays with a frog
Coco the gorilla robot

Publications

2004

2003

2002

2001

2000

1999

[Back to Top]

Presentations

2004

2003 - add links

2002

2001

2000

1999

[Back to Top]

Quicktime Movies

  • This video shows a 2 DOF active vision system and a 3 DOF prototype arm and visual system pushing a block off of the table. It is a simple demonstration of the embedded behavior based controller performing three behaviors: a zero-force, highly compliant mode, arm tracking of the target, and visual closed loop control of the arm to poke the target. view quicktime movie button
  • This video shows a 2 DOF active vision system and a 3 DOF prototypte arm tracking a simple target, demonstrating the integration of the system's visual system with its motor system on an embedded architecture. view quicktime movie button
  • Click to view a testing prototype of a simple and scalable rotary force actuator (SEA) that is compact and easy to build. When the actuator is not controlled view quicktime movie button it is quite stiff. When it is controlled at zero force view quicktime movie button, it complies with gravity. When it is operating under force control, the actuator moves view quicktime movie button but handles resistance appropriately. view quicktime movie button
  • This clip shows the arm stiffness before and after Cog learns the implications of gravity. Before learning the movements are less accurate and exact. During learning Cog samples the postures of its workspace and refines the force function it uses to supply feed forward commands to posture its arm. Learning results in improved arm movement. view quicktime movie button [12.6MB]
  • In this clip Cog's torso is moving randomly under reflexive control. When extremities are reached, Cog's model of pain is activated and the associated reflex is refined to reduce the extent of its extremity. As adaptation proceeds, Cog learns to balance itself. view quicktime movie button [13.4MB]
  • Cog's arm and torso movements are displayed in the (top) of the screen. As Cog moves, the GUI shows the multi-joint muscle model overlaying Cog's joints and how it behaves. The model itself can be modified by the GUI shown at the (bottom) of the screen. view quicktime movie button [7.1MB]
  • Cog's two degrees of freedom hand, equipped with tactile sensors, has a reflex that grasps and extends in a manner similar to primate infants. Contact inside the hand causes a short term grasp, contact to the back of the hand causes an extensive stretch. view quicktime movie button [6.4MB]
  • Cog is trying to identify its own arm. It generates a particular rhythmic arm movement and sees this. It correlates the visual signature of the motion with its commands to move the arm and thus forms a representation of the arm in the image. view quicktime movie button [6.9MB]
  • In these two videos Cog reaches for an object as identified by its visual attention system. It recognizes its own arm (shown in green) and identifies the arm endpoint (a small red square). When the object is contacted, the object's motion (differentiated from the arm's) is used as a cue for object segmentation. It's a block! view quicktime movie button [752KB] view quicktime movie button [744KB]
  • The M4 robot consists of an active vision robotic head integrated with a Magellan mobile platform. The robot integrates vision-based navigation with human-robot interaction. It operates a portable version of the attentional systems of Cog and Lazlo with specific customization for a thermal camera. Navigation, social preferences and protection of self are fulfilled with a model of motivational drives. Multi-tasking behaviors such as night time object detection, thermal-based navigation, heat detection, obstacle detection and object reconstruction are based upon a competition model. view quicktime movie button [33.6MB]
  • Kismet has the ability to learn to recognize and remember people it interacts with. Such social competence leads to complex social behavior, such as cooperation, dislike or loyalty. Kismet has an online and unsupervised face recognition system, where the robot opportunistically collects, labels, and learns various faces while interacting with people, starting from an empty database. view quicktime movie button [47MB]
  • Kismet uses utterances as a way to manipulate its environment through the beliefs and actions of others. It has a vocal behavior system forming a pragmatic basis for higher level language acquisition. Protoverbal behaviors are influenced by the robot’s current perceptual, behavioral and emotional states. Novel words (or concepts) are created and managed. The vocal label for a concept is acquired and updated. view quicktime movie button [7.8MB]
  • This video clip shows the pose of a subject's head being tracked. The initial pose of the head is not known. Whenever the head is close to a frontal position, its pose can be determined accurately and tracking is reset. In this example, the mesh is shaded in two colors, showing where the left and right parts of the face are believed to be. view quicktime movie button [5.7MB]
  • This video clip shows part of a training session in which Kismet is taught the structure of a sorting task. The first part shows Kismet acquiring some task-specific vocabulary -- in this case, the word "yellow". The robot is then shown green objects being placed on one side, and yellow objects being placed on another. Throughout the task the presenter is commenting in the shared vocabulary. Towards the end of the video, Kismet makes predictions based on what the presenter says. view quicktime movie button [1.5MB]
  • This video clip shows an example of Cog mimicking the movement of a person. The visual attention system directs the robot to look and turn its head toward the person. Cog observes the movement of the person's hand, recognizes that movement as an animate stimulus, and responds by moving its own hand in a similar fashion. view quicktime movie button [100KB]
  • We have also tested the performance of this mimicry response with naive human instructors. In this case, the subject gives the robot the American Sign Language gesture for "eat", which the robot mimics back at the person. Note that the robot has no understanding of the semantics of this gesture, it is merely mirroring the person's action. view quicktime movie button [332KB]
  • The visual routines that track the moving object operate at 30 Hz, and can track multiple objects simultaneously. In this movie, Cog is interested in one of the objects being juggled. The robot attempts to imitate the parabolic trajectory of that object as it is thrown in the air and caught. view quicktime movie button [1MB]
  • Cog does not mimic every movement that it sees. Two types of social cues are used to indicate which moving object out of the many objects that the robot is tracking should be imitated. The first criterion is that the object display self-propelled movement. This eliminates objects that are either stationary or that are moving in ways that are explained by naive rules of physics. In this video clip, when the robot observes the ball moving down the ramp, Cog interprets the movement as linear and following gravity and ignores the motion. When the same stimulus moves against gravity and rolls uphill, the robot becomes interested and mimics its movement. view quicktime movie button [884KB]
  • The second social cue that the robot uses to pick out a moving trajectory is the attentional state of the instructor. (Whatever the instructor is looking at is assumed to be the most important part of the scene.) Although our robots currently lack the complex visual processing to determine the instructor's eye direction, we can accurately obtain the orientation of the instructor's head and use this information as an indicator of attention. In this movie, a large mirror has been placed behind the robot to allow the video camera to record both the robot's responses and the head orientation of the instructor. When the instructor looks to the left, the movement of his left arm becomes more salient and Cog responds by mimicking that motion. When the instructor looks to the right, his right arm movements are mimicked. view quicktime movie button [948KB]
  • This video clip demonstrates the simple ways that Cog interprets the intentions of the instructor. Note that unlike the other video clips, in this example, the instructor was given a specific sequence of tasks to perform in front of the robot. The instructor was asked to "get the robot's attention and then look over at the block". Cog responds by first fixating the instructor and then shifting its gaze to the block. The instructor was asked to again get the robot's attention and then to reach slowly for the block. Cog looks back at the instructor, observes the instructor moving toward the block, and interprets that the instructor might want the block. Although Cog has relatively little capabilities to assist the instructor in this case, we programmed the robot to attempt to reach for any target that the instructor became interested in. view quicktime movie button [632KB]
  • A video clip of Cog's new hand demonstrating various grapsing behaviors. The 2 degree of freedom hands utilize series elastic actuators and rapid prototyping technology. view quicktime movie button [700KB]
  • A video clip of Cog's new force-control torso exhibiting virtual spring behavior. The ability to use virtual spring control on the torso allows for full body/arm integration and for safe human-robot interaction. view quicktime movie button [348KB]
  • This video clip shows the attentional/control system of Lazlo (same kinematics as in COG's head). Visual processing uses color cues (detects brightly colored blobs and skin tone color), motion (optic flow and background subtraction), and binocular disparity (used to control vergence). Inertial (gyros based) image stabilization to external disturbances is also shown. Particular care has been devoted to the design of the controller to obtain smooth and continuous movements. view quicktime movie button [1.3MB]
  • In this video clip, Kismet engages people in a proto-dialog. The robot does not speak any language; it babbles so don't expect to understand what it's saying. The turn taking dynamics are quite fluid and natural. The robot sends a variety of turn-taking cues through vocal pauses, gaze direction and postural changes. The first segment is with two of Kismet's caregivers. The next two are with naive subjects. The last is edited from one long interaction.view quicktime movie button [9.3MB]
  • In this video clip, Kismet correctly interprets 4 classes of affective intent: praise, prohibition, attentional bids, and soothing. These were taken from cross-lingual studies with naive subjects. The robot's expressive feedback is readily interpreted by the subjects as well. view quicktime movie button [5.3MB]
  • In this video clip, Kismet says the phrase "Do you really think so" with varying emotional qualities. In order, the emotional qualities correspond to calm, anger, disgust, fear, happy, sad, interest.view
quicktime movie button [3.6MB]
  • In this video clip, Kismet is searching for a toy. It's facial expression and eye movements make it readily appearant to an observer when the robot has discovered the colorful block on the stool. The attention system is always running and enabling the robot to respond appropriately to unexpected stimuli (such as the person entering from the right hand side of the frame to take away the toy). Notice how Kismet appears a bit crest-fallen when it's toy is removed. view
quicktime movie button [3.9MB]

These three movies show the visual attention system in action:

  • This clip illustrates the color saliency process. The left frame is the video signal, the right frame shows how the colorful block is particularly salient. The middle frame shows the raw saliency value due to color. The bright region in the center is the habituation influence. view quicktime movie button [6.8MB]
  • This clip illustrates the motion saliency process.view quicktime movie button [5.9MB]
  • This clip illustrates the face saliency process (center) and the habituation process (right).view quicktime movie button [14.3MB]

[Back to Top]

home button
next button

[Project Overview], [Approach], [Research Questions], [Achieved Deliverables], [Future Deliverables], [People], [Publications]