Current Projects

START System

The START Natural Language System understands and generates language and answers questions that are posed to it in natural language. As a question-answering system, START parses incoming questions, matches the queries created from the parse trees against its knowledge base and presents the appropriate information segments to the user, providing "just the right information".

Boris Katz, Sue Felshin

Detailed Interpretation of Object Images

The goal of this project is to model the process of ‘full interpretation’ of object images, namely the ability to identify and localize all semantic features and parts that are recognized by human observers. Our approach is based on interpreting multiple reduced but still interpretable local regions that comprise the complete object. In such reduced regions, interpretation is simpler, since the number of semantic components is small and the variability of possible configurations is low. To identify useful components and relations used in the local interpretation process, we consider the interpretation of 'minimal configurations’, which are reduced local regions that are minimal in the sense that further (small) reduction will make them unrecognizable and uninterpretable. We study implications of ‘full interpretation’ for difficult visual tasks, such as recognizing actions and social interactions.

Guy Ben-Yosef

Language Learning from a Computational Perspective

Language learning is a fundamental aspect of human intelligence. Our project takes an interdisciplinary approach to this topic, integrating methodologies and theories from linguistics and psychology with modern computational modelling tools from Natural Language Processing (NLP) and computational linguistics. We conduct empirical studies of language production (writing) as well as language comprehension (reading), with particular focus on the role of a first language in second language acquisition and processing. We harness insights from such studies to develop NLP applications for learner language.

Yevgeni Berzak

Inferring “Theory of Mind” with vision and language

Human agents are known for having "Theory of Mind (ToM)" ability to infer others’ mental states, e.g. intention and belief. Some early studies suggest language development help ToM reasoning.In this project, we aim to build a vision and language system, which learns and understands the world like a child. The way it understands other agents’ intent is from its interpretation of visual perceptions. Here, the interpretation is the other agent’s plan to achieve her goal (i.e. ground perception in planning) and the system can describe and evaluate this interpretation in natural language.We know humans formulate complex goals and we know those goals are influenced by language. So, this project is both an effort to shed light on the fundamentals of human planning and how we conceive about plans and at the same time an effort to understand videos at a higher, more human-like, level.

Yen-Ling Kuo

Generating Plans for Agents using Video and Language

Human cognition is extremely flexible. We can easily adapt to new tasks, generalize about the world around us and use perception to learn from our environment. When we think about models of artificial intelligence, often the opposite is true: the models cannot generalize to novel tasks, they perform in constrained environments, and they are often disconnected from perception. I aim to build models grounded in perception that tackle classical planning problems in AI in a new realm. I use a joint approach that combine techniques in natural language processing and computer vision. Using language and vision, my works aims to observe agents in videos and generate plans for optimal goal accomplishment.

Candace Ross

Semantic Parsing using Vision

Many existing approaches to learning semantic parsers require annotated parse trees or are entirely unsupervised. In contrast, children acquire language through interacting with the world around them. We aim to learn language in a manner more similar to children: by distant supervision through captioned videos. Our model is a joint vision-language system that use learns the meaning of words by observing actions from a sentence depicted in a video. This work means more robust parsing that incorporates perception.

Candace Ross, Battushig Myanganbayar, Andrei Barbu, Yevgeni Berzak