"Ask, and Ye Shall Compute"
Mass High Tech
Nov. 25-Dec. 1, 1996
Volume 14, Issue 41
Fans of Star Trek take it for granted that when Captain Picard needs information from a database
aboard the Starship Enterprise, he simply talks to his computer.
After years of research, commercial voice recognition software is now available here on earth where it
will be increasingly used to make purchases, check airline schedules and even search for Web sites.
The technology has been developed by the Spoken Language Systems Group at MIT's Laboratory for Computer
Science (LCS). Since the late 1980s, the Galaxy project at LCS has been creating software systems that
give users new ways to communicate with computers, ways that take the load off overworked fingertips.
"We were born with mouths and ears rather than a keyboard and mouse," observes Michael Dertouzos,
Director of the LCS who says voice recognition technology is now ready for widespread use. He says
Galaxy and other language-based technologies will change the process of human and machine communication,
from programming to conversations where a person and a machine cooperate to solve problems.
But he cautions that users shouldn't expect to have deep conversations with their PC.
"It must be confined to a narrow domain of discussion," says Dertouzos, "It can't be a discussion of
freedom or politics."
Early voice recognition systems, which were often targeted to the disabled community, required users
to pause after every word and had to be trained to recognize a specific voice. But the Galaxy system
allows users to speak in a natural cadence. The software allows computers to receive voice information
via built-in microphone or over the phone.
The Galaxy technology is now spilling over into the marketplace. Mark Phillips, who worked on the Galaxy
project for seven years, co-founded Applied Language Technologies Inc. (ALTech) in 1994 to license the
Galaxy technology and bring voice recognition software to the commercial user.
ALTech recently announced the release of SpeechWorks 2.0 and accompanying DialogModules, which give
developers the first opportunity to build advanced telephony-based speech recognition applications with
Researchers says that voice recognition systems will soon replace many of the touchtone features that
callers user to navigate a maze of menus when they request automated information over the phone. But
along the way, researchers working on voice recognition software have had to solve challenging technical
requirement that have taken years to refine.
In order to respond to a spoken command, a computer must first recognize every word in the sentence,
parse the sentence into grammatical elements, understand the meaning and act on it. A spoken command is
first processed by speech recognition software, then by a natural language component which interprets
the meaning of the words. This data is then used to retrieve appropriate information in the form of
text, tables and graphics, which appear on a computer screen. Data can also be delivered to a user via
a synthesized computer voice.
The Galaxy team has had to write programs which model how people hear and what spontaneous irregularities
appear in spoken English, Spanish, Chinese and Japanese. They have also tried to accommodate foreign
Since language is a largely social activity that supports a shared collection of assumptions, it is
difficult for a computer to understand the context in which words are used. The Galaxy system tackles
this problem by operating in a limited number of knowledge domains or topic areas.
Galaxy researchers have been working with a handful of domains including the Voyager CityGuide, a
visitor's guide to Cambridge, and Pegasus, which is connected to the American Airlines Easy Sabre
reservation system. Galaxy is also linked to a weather database which allows a user to check on the
weather in 250 cities.
Here's how it works. Seated in front of his PC, Galaxy researcher Jim Glass picks up his phone and
asks his computer to locate all the Italian restaurants in Cambridge near the LCS. His request is
translated into text at the top of the screen. He confirms that the request is accurate and a list of
the desired restaurants appears.
"Show me all the flights leaving from Boston to London on Friday?" asks Glass and the Pegasus domain
displays a list of flight times on his screen.
The Galaxy system now accesses information sitting on MIT servers, but Glass predicts that by1998,
similar software will be used to download information from the Web by recognizing spoken key words. Glass
says one of the goals of the Galaxy project is to link domains of information together and make voice
recognition software portable across domains and computer languages.
Glass predicts that by 1998, voice recognition software will let users surf the Web hands free,
allowing key words to be entered verbally. Dertouzos predicts that by the year 2,000, consumers will
be able to purchase a $200 voice recognition software program with a 5,000 word vocabulary.
But PC users must now insert special telephony cards to create a voice interface to their computer.
As a result, LCS and ALTech are currently focusing on displayless technology in which a computer delivers
the requested data in a synthesized voice. Glass says the technology, which is best suited for telecom
and enterprise applications, will be in use by next year.
"We are working on making it more accessible, using phones and displayless technology so that the
computer can have a conversation with a person," says Glass. "It's a matter of the interface catching
up with the hardware."
Glass notes that AT&T is using speech recognition software to recognize the digits in credit card
numbers and NYNEX is developing a phone-based restaurant guide based on voice recognition software. The
LCS is now participating in a trial which users employ a telephone interface to fill in on-screen forms
and he predicts more innovative combinations of voice and display technology.
"Instead of going to the computer, we want the computer to come to the human," says Glass. "We're
just crossing the threshold now of computers becoming faster and getting into multiple domains. It's
a very exciting time."