|
Recognition of affective intent in speechHuman speech provides a natural and intuitive interface for both communicating with humanoid robots as well as for teaching them. To this end, Kismet recognizes and affectively responds to praise, prohibition, attention, and comfort in robot-directed speech. These affective intents are well matched to human-style instruction scenarios since praise, prohibition, and directing the robot's attention to relevant aspects of a task, could be intuitively used to train a robot.
The system runs in real-time and exhibits robust performance (i.e., for a teaching task, confusing strongly valenced intent for neutrally valenced intent is better than confusing oppositely valenced intents. For instance, confusing approval for an attentional bid, or prohibition for neutral speech, is better than interpreting a prohibition for praise.). Communicative efficacy has been tested and demonstrated in multi-lingual studies with the robot's caregivers as well as with naive subjects (only female subjects have been tested so far). Importantly, we have discovered some intriguing social dynamics that arise between robot and human when expressive feedback is introduced. This expressive feedback plays an important role in facilitating natural and intuitive human-robot communication.
Infant recognition of affective intentDevelopmental psycholinguists have extensively studied how affective intent is communicated to preverbal infants. Infant-directed speech is typically quite exaggerated in pitch and intensity. From the results of a series of cross-cultural studies, Anne Fernald suggests that much of this information is communicated through the ``melody" of infant-directed speech. In particular, there is evidence for at least four distinctive prosodic contours, each of which communicates a different affective meaning to the infant (approval, prohibition, comfort, and attention) -- see figure. Maternal exaggerations in infant-directed speech seem to be particularly well matched to the innate affective responses of human infants.
Recognition of affective intentInspired by this work, we have implemented a recognizer to distinguish the four affective intents for praise, prohibition, comfort, attentional bids. Of course, not everything a human says to Kismet will have an affective meaning, so we also distinguish neutral robot-directed speech. We have intentionally designed Kismet to resemble a very young creature so that people are naturally inclined speak to Kismet with appropriately exaggerated prosody. This aesthetic choice has payed off nicely for us. As shown below, the preprocessed pitch contour of labeled utterances resembles Fernald's prototypical prosodic contours for approval, attention, prohibition, and comfort/soothing.
For Kismet, output of the vocal affective intent classifier is interfaced with the emotion subsystem where the information is appraised at an affective level and then used to directly modulate the robot's own affective state. In this way, the affective meaning of the utterance is communicated to the robot through a mechanism similar to the one Fernald suggests. The robot's current "emotive" state is reflected by its facial expression and body posture. This affective response provides critical feedback to the human as to whether or not the robot properly understood their intent. As with human infants, socially manipulating the robot's affective system is a powerful way to modulate the robot's behavior and to elicit an appropriate response. The video segment on this page illustrates these points.
|