Spoken Language Systems
MIT Computer Science and Artificial Intelligence Laboratory


What We Do
Provide Universal Access
As computers increasingly permeate our daily lives, our demand for online information is skyrocketing. Growing numbers of us turn to the Internet to catch up on the latest news, sports, and weather, obtain stock quotes, reserve airline flights, conduct research, or check out what's playing at local theaters. Unfortunately, navigating through vast amounts of data to obtain useful information can require a time-consuming series of keyboard entries and mouse clicks, and technical savvy. But there is a more efficient, more flexible tool available for human-computer interaction, something that even the most technically challenged of us could use anywhere, any time: spoken language.

In order to make it possible for humans to speak to computers a conversational interface is needed. A conversational interface enables humans to converse with machines (in much the same way we communicate with one another) in order to create, access, and manage information and to solve problems. It is what Hollywood and every "vision of the future" tells us that we must have. Since 1989, getting computers to communicate the way people do -- by speaking and listening -- has been the objective of the Spoken Language Systems (SLS) Group at MIT's Computer Science and Artificial Intelligence Laboratory.

How does it work?
Imagine talking to a computer to find a needle-in-the haystack job listing, or showtimes of a movie premiere at the closest theater. Today, obtaining such information online requires a programmed transaction between the user, who clicks through a pre-determined sequence of options and views results, and the computer, which retrieves user-selected data. With spoken language systems, however, user and machine can engage in a spontaneous, interactive conversation, incrementally arriving at the desired information in far fewer steps. A case in point is the following excerpt from a conversation between a user and JUPITER, an SLS-based weather forecast system:

JUPITER audio file

User: Yes, I would like the weather forecast for London, England, please.

JUPITER: In London in England Wednesday, partly cloudy skies with periods of sunshine. High 82 and low 63. Is there something else?

User: What is that in degrees Celsius, please?

JUPITER: In London in England Wednesday, high 28 Celsius and low 17 Celsius. What else?

SLS researchers make this kind of dialogue look easy by empowering the computer to perform five main functions in real time:  speech recognition-- converting the user's speech to a text sentence of distinct words, language understanding -- breaking down the recognized sentence grammatically, and systematically representing its meaning, information retrieval -- obtaining targeted data, based on that meaning representation, from the appropriate online source, language generation -- building a text sentence that presents the retrieved data in the user's preferred language, and speech synthesis -- converting that text sentence into computer-generated speech. Throughout the conversation, the computer also remembers previous exchanges. In this example, JUPITER can respond to "What is that in degrees Celsius, please?" because the user has just asked about weather conditions in London. Otherwise, the system would request the user to clarify the question.

Many speech-based interfaces can be considered conversational, and they may be differentiated by the degree with which the system maintains an active role in the conversation, or the complexity of the potential dialogue. At one extreme are system-initiative, or "directed-dialogue" transactions where the computer takes complete control of the interaction by requiring that the user answer a set of prescribed questions, much like the touch-tone implementation of interactive voice response (IVR) systems. In the case of air travel planning, for example, a directed-dialogue system could ask the user to "Please say just the departure city." Since the user's options are severely restricted, successful completion of such transactions is easier to attain, and indeed some successful demonstrations and commercial deployment of such systems have been made. At the other extreme are user-initiative systems in which the user has complete freedom in what they say to the system, (e.g., "I want to visit my grandmother") while the system remains relatively passive, asking only for clarification when necessary. In this case, the user may feel uncertain as to what capabilities exist, and may, as a consequence, stray quite far from the domain of competence of the system, leading to great frustration because nothing is understood. Lying between these two extremes are systems that incorporate a "mixed-initiative", goal-oriented dialogue, in which both the user and the computer participate actively to solve a problem interactively using a conversational paradigm. It is this latter mode of interaction that is the primary focus of our research.

In 1994 has developed an conversational architecture called GALAXY that incorporates the necessary human language technologies (i.e., speech understanding and generation, discourse and dialogue) to enable advanced research in mixed-initiative interaction. Since then, the open source architecture has been adopted by many researchers around the world as a framework for conducting their research on advanced spoken dialogue systems. Here at MIT, we have developed many prototype conversational systems, many of which are deployed on toll-free telephone numbers, that enable users to access information about weather forecasts (JUPITER), airline scheduling (PEGASUS) and flight planning (MERCURY), Cambridge city locations (VOYAGER), and selected Web-based information (WebGALAXY).

Raising the Level of Human to Computer Conversation
Although tremendous progress has been made over the last decade in developing advanced conversational spoken language technology, much additional progress must be achieved before conversational interfaces approach the level of naturalness of human-human conversations. Today SLS researchers are refining core human language technologies and are incorporating speech with other kinds of natural input modilities such as pen and gesture. They are working to upgrade the efficiency and naturalness of application-specific conversations, improve new word detection/learning capability during speech recognition, and increase the portability of core technologies and develop new applications. As the SLS Group continues to address these issues, it brings us closer to the day when anyone, anywhere, any time, can interact easily with computers.

Further Reading:

V. Zue and J. Glass, "Conversational Interfaces: Advances and Challenges" Proceedings of the IEEE, Special Issue on Spoken Language Processing, Vol. 88, August 2000. (PDF)

J. Glass and S. Seneff, "Flexible and Personalizable Mixed-Initiative Dialogue Systems," presented at HLT-NAACL 2003 Workshop on Research Directions in Dialogue Processing, Edmonton, Canada, May 2003. (PDF)

V. Zue, et al., "JUPITER: A Telephone-Based Conversational Interface for Weather Information," IEEE Transactions on Speech and Audio Processing, Vol. 8 , No. 1, January 2000.(PDF)

32 Vassar Street
Cambridge, MA 02139 USA
(+1) 617.253.3049

©2020, Spoken Language Systems Group. All rights reserved.

About SLS
---Our Technologies
Research Initiatives
---Research Summary
News and Events
---News Articles
SLS People
---Research Staff
---Post-Doctoral Students
---Administrative Staff
---Support Staff
---Graduate Students
---Undergraduate Students
---Positions with SLS
Contact Us
---Positions with SLS
---Visitor Information