Our group develops algorithms and systems for perceptive interfaces, which enable users to interact with machines using natural expression and gesture and also allow machines to understand a users’ physical environment. We develop computer vision algorithms to support two very useful forms of interaction: first, enabling machines to interact with people through multimodal conversation, and second, allowing devices to recognize objects of interest to a user and provide situated search for information about those objects. Enabling machines to understand multimodal communication and reference is extremely valuable in many application areas. Our projects can be clustered roughly into three technical topics:
  • multimodal stream processing
  • estimation of human body pose and gesture, and
  • matching and recognition of scenes, objects, and object categories.
These are described in more detail on the project page, and in the publications listed on the publications page.