![]() |
![]() |
Low-level Visual FeaturesKismet's low-level visual perception system extracts a number of features that human infants seem to be particularly responsive toward. These low-level features were selected for their ability to help Kismet distinguish social stimuli (i.e. people, that is based on skin tone, eye detection, and motion) from non-social stimuli (i.e. toys, that is based on saturated color and motion), and to interact with these stimuli in interesting ways (often modulated by the distance of the target stimulus to the robot). There are a few perceptual abilities that serve self-protection responses. These include detecting looming stimuli as well as potentially dangerous stimuli (characterized by excessive motion close to the robot). Kismet's low-level visual features are as follows:
Motion Saliency MapIn parallel with the color saliency computations, another processor receives input images from the frame grabber and computes temporal differences to detect motion. Motion detection is performed on the wide field of view, which is often at rest since it does not move with the eyes. This raw motion map is then smoothed. The result is a binary 2-D map where regions corresponding to motion have a high intensity value.
Skin tone map![]() ![]()
Eye DetectionDeveloped by Aaron Edsinger (edsinger@ai.mit.edu).
![]()
Proximity![]()
Loom DetectionThe loom calculation makes use of the two cameras with wide fields of view. These cameras are parallel to each other, so when there is nothing in view that is close to the cameras (relative to the distance between them), their output tends to be very similar. A close object, on the other hand, projects very differently on to the two cameras, leading to a large difference between the two views.By simply summing the pixel-by-pixel differences between the images from the two cameras, we extract a measure which becomes large in the presence of a close object. Since Kismet's wide cameras are quite far from each other, much of the room and furniture is close enough to introduce a component into the measure which will change as Kismet looks around. To compensate for this, the measure is subject to rapid habituation. This has the side-effect that a slowly approaching object will not be detected - which is perfectly acceptable for a loom response.
Threat DetectionA nearby object (as computed above) along with large but concentrated movement in the wide fov is treated as a threat by Kismet. The amount of motion corresponds to the amount of activation of the motion map. Since the motion map may also become very active during ego-motion, this response is disabled for the brief intervals during which Kismet's head is in motion. As an additional filtering stage, the ratio of activation in the peripheral part of the image versus the central part is computed to help reduce the number of spurious threat responses due to ego-motion. This filter thus looks for concentrated activation in a localized region of the motion map, whereas self induced motion causes activation to smear evenly over
Low Level Auditory FeaturesKismet's low-level auditory perception system extracts a number of features that are also useful for distinguishing people from other sound emitting objects such as rattles, bells, and so forth. The software runs in real-time and was developed at MIT by the Spoken Language Systems Group (www.sls.lcs.mit.edu/sls). Jim Glass and Lee Hetherington were tremendously helpful in tailoring the code for our specific needs and in assisting us to port this sophisticated speech recognition system to Kismet. The software delivers a variety of information that is used to distinguish speech-like sounds from non-speech sounds, to recognize vocal affect, and to regulate vocal turn-taking behavior. The phonemic information may ultimately be used to shape the robot's own vocalizations during imitative vocal games, and to enable the robot to acquire a proto-language from long term interactions with human caregivers. Kismet's low level auditory features are as follows:
|