Spoken Language Systems
MIT Computer Science and Artificial Intelligence Laboratory


The conversational systems developed by SLS require a number of specialized core technologies to perform the tasks of speech recognition, natural language understanding, discourse and dialogue modeling, language generation and speech synthesis. SLS has developed its own system components for each of these tasks as well as creating an architecture to integrate these individual components together to create systems for specific applications.

An Example
To illustrate these SLS technology components in action, let's consider the following request posed to the MERCURY air travel planning server:

"Is there a flight from Boston to San Franscisco Friday?"

GALAXY: An Architecture for Conversational Speech Systems
Conversational speech systems require the integration of a variety of different specialized components. GALAXY provides a client/server based architecture for performing this system integration task. In our example, GALAXY's hub is responsible for moving the user's request through the various stages of processing described below.

SUMMIT: Speech Recognition
Spoken language conveys measurable acoustic signals. SUMMIT converts these signals into a sentence of distinct words by matching segments of the incoming signals with a stored library of phonemes -- irreducible units of sound (such as the "B" in "Boston") that make up a word. Relying on internal language models, SUMMIT then generates a ranked list of candidate sentences. In our example, SUMMIT produces the following list:

  1. Is there a flight from Boston to San Franscisco Friday?
  2. Is there a flight from Austin to San Franscisco Friday?
  3. Is there flight from Boston to San Franscisco Friday?
  4. ...
TINA: Natural Language Understanding
Beneath the surface representation of words, sentences carry a deeper semantic meaning. In order to determine what a user actually wants, the user's utterance must be represented in a logical, meaningful structure. Based on stored rules, TINA parses each sentence into grammatical components, such as subject, verb, object, and predicate. TINA then augments the syntactic components with semantic information and converts the sentences into a semantic frame, a command-like structure consisting of clauses, topics, and predicates. In our example, the semantic frame for "Is there a flight from Boston to San Franscisco Friday?" would be

Clause: EXIST
Quantifier: INDEF
Predicate: SOURCE
Topic: CITY
Name: Boston
Topic: CITY
Name: San Francisco
Predicate: TIME
Topic: DATE
Day: Friday

Dialogue Management
In order to carry out the request of a user, it is the role of the dialogue manager to evaluate the relevance and completeness of the user's request, retrieve the requested information from the database and format an appropriate reply in the form of a semantic frame.
In our example, the dialogue manager might return the following semantic frame representing the retrieved information from the database:
Flights found: 3
Date: October 19
Airline: United
Flight number: 163
Departure Airport: BOS
Departure Time: 7:00 AM
Arrival Airport: SFO
Arrival Time: 10:23 AM
Stops: 0
Date: October 19
Airline: United
Flight number: 161
Departure Airport: BOS
Departure Time: 9:00 AM
Arrival Airport: SFO
Arrival Time: 12:22 PM
Stops: 0
Date: October 19
Airline: American
Flight number: 195
Departure Airport: BOS
Departure Time: 9:00 AM
Arrival Airport: SFO
Arrival Time: 12:37 PM
Stops: 0

GENESIS: Language Generation
GENESIS processes components of a semantic frame and generates a text representation of the semantics in the requested language. GENESIS can generate text in natural languages such as English or Chinese or in formal languages such as the Standard Query Language (SQL). For the MERCURY example, GENESIS takes a frame containing the tabular data returned from the database, and converts this frame into a standard English response. For example:

I have 3 nonstop flights:
A United flight arriving at 10:23 AM,
a United flight arriving at 12:22 PM,
and an American flight arriving at 12:37 PM.
Please select one of these flights or change
any constraint you have already specified.

ENVOICE: Speech Synthesis
ENVOICE is a concatenative speech synthesis system which creates synthetic speech by concatenating segments of speech from a pre-recorded speech corpus. The concatenation of segments can occur at the phrase, word, or sub-word levels.

32 Vassar Street
Cambridge, MA 02139 USA
(+1) 617.253.3049

©2020, Spoken Language Systems Group. All rights reserved.

About SLS
---Our Technologies
Research Initiatives
---Research Summary
News and Events
---News Articles
SLS People
---Research Staff
---Post-Doctoral Students
---Administrative Staff
---Support Staff
---Graduate Students
---Undergraduate Students
---Positions with SLS
Contact Us
---Positions with SLS
---Visitor Information