SLS :: News and Events

Mass High Tech
Nov. 25-Dec. 1, 1996
Volume 14, Issue 41
Page 1

Fans of Star Trek take it for granted that when Captain Picard needs information from a database aboard the Starship Enterprise, he simply talks to his computer.

After years of research, commercial voice recognition software is now available here on earth where it will be increasingly used to make purchases, check airline schedules and even search for Web sites.

The technology has been developed by the Spoken Language Systems Group at MIT's Laboratory for Computer Science (LCS). Since the late 1980s, the Galaxy project at LCS has been creating software systems that give users new ways to communicate with computers, ways that take the load off overworked fingertips.

"We were born with mouths and ears rather than a keyboard and mouse," observes Michael Dertouzos, Director of the LCS who says voice recognition technology is now ready for widespread use. He says Galaxy and other language-based technologies will change the process of human and machine communication, from programming to conversations where a person and a machine cooperate to solve problems.

But he cautions that users shouldn't expect to have deep conversations with their PC.

"It must be confined to a narrow domain of discussion," says Dertouzos, "It can't be a discussion of freedom or politics."

Early voice recognition systems, which were often targeted to the disabled community, required users to pause after every word and had to be trained to recognize a specific voice. But the Galaxy system allows users to speak in a natural cadence. The software allows computers to receive voice information via built-in microphone or over the phone.

The Galaxy technology is now spilling over into the marketplace. Mark Phillips, who worked on the Galaxy project for seven years, co-founded Applied Language Technologies Inc. (ALTech) in 1994 to license the Galaxy technology and bring voice recognition software to the commercial user.

ALTech recently announced the release of SpeechWorks 2.0 and accompanying DialogModules, which give developers the first opportunity to build advanced telephony-based speech recognition applications with Galaxy technology.

Researchers says that voice recognition systems will soon replace many of the touchtone features that callers user to navigate a maze of menus when they request automated information over the phone. But along the way, researchers working on voice recognition software have had to solve challenging technical requirement that have taken years to refine.

In order to respond to a spoken command, a computer must first recognize every word in the sentence, parse the sentence into grammatical elements, understand the meaning and act on it. A spoken command is first processed by speech recognition software, then by a natural language component which interprets the meaning of the words. This data is then used to retrieve appropriate information in the form of text, tables and graphics, which appear on a computer screen. Data can also be delivered to a user via a synthesized computer voice.

The Galaxy team has had to write programs which model how people hear and what spontaneous irregularities appear in spoken English, Spanish, Chinese and Japanese. They have also tried to accommodate foreign accents.

Since language is a largely social activity that supports a shared collection of assumptions, it is difficult for a computer to understand the context in which words are used. The Galaxy system tackles this problem by operating in a limited number of knowledge domains or topic areas.

Galaxy researchers have been working with a handful of domains including the Voyager CityGuide, a visitor's guide to Cambridge, and Pegasus, which is connected to the American Airlines Easy Sabre reservation system. Galaxy is also linked to a weather database which allows a user to check on the weather in 250 cities.

Here's how it works. Seated in front of his PC, Galaxy researcher Jim Glass picks up his phone and asks his computer to locate all the Italian restaurants in Cambridge near the LCS. His request is translated into text at the top of the screen. He confirms that the request is accurate and a list of the desired restaurants appears.

"Show me all the flights leaving from Boston to London on Friday?" asks Glass and the Pegasus domain displays a list of flight times on his screen.

The Galaxy system now accesses information sitting on MIT servers, but Glass predicts that by1998, similar software will be used to download information from the Web by recognizing spoken key words. Glass says one of the goals of the Galaxy project is to link domains of information together and make voice recognition software portable across domains and computer languages.

Glass predicts that by 1998, voice recognition software will let users surf the Web hands free, allowing key words to be entered verbally. Dertouzos predicts that by the year 2,000, consumers will be able to purchase a $200 voice recognition software program with a 5,000 word vocabulary.

But PC users must now insert special telephony cards to create a voice interface to their computer. As a result, LCS and ALTech are currently focusing on displayless technology in which a computer delivers the requested data in a synthesized voice. Glass says the technology, which is best suited for telecom and enterprise applications, will be in use by next year.

"We are working on making it more accessible, using phones and displayless technology so that the computer can have a conversation with a person," says Glass. "It's a matter of the interface catching up with the hardware."

Glass notes that AT&T is using speech recognition software to recognize the digits in credit card numbers and NYNEX is developing a phone-based restaurant guide based on voice recognition software. The LCS is now participating in a trial which users employ a telephone interface to fill in on-screen forms and he predicts more innovative combinations of voice and display technology.

"Instead of going to the computer, we want the computer to come to the human," says Glass. "We're just crossing the threshold now of computers becoming faster and getting into multiple domains. It's a very exciting time."