Spoken Language Systems
MIT Computer Science and Artificial Intelligence Laboratory

Voice lessons - Computers are beginning to grasp the spoken word

Stephanie Schorow

Boston Herald
Page 051
(Copyright 1997)

A lot of us talk to our computers. And mostly, the computers don't understand us.

Which is, probably, a good thing - since we're generally grousing at the machine for something or other.

Still, the time when computers will understand us (and, let's hope, politely ignore any rudeness) is not that far distant. Enhanced speech recognition software programs are not only storming the market, but are going on sale at prices well within the reach of those of us not inclined to buy every new computer gadget.

We're still a long way from HAL, the suavely sincere (and sinister) megamachine of "2001: A Space Odyssey," but we've taken the first major steps in that direction.

Last month, IBM released ViaVoice for Microsoft Windows 95 and Windows NT, a program that can turn conversational dictation into a document for $99. After loading the software, users begin "training" the machine with a three-sentence session. They may read 265 more sentences to improve recognition to 95 percent.

ViaVoice joined Dragon Naturally Speaking, released earlier by industry leader Dragon Systems of Newton at $695. The price has since dropped to $349. Dragon Systems was among the first companies to produce what is the breakthrough aspect of newer speech recognition programs, that is, the capacity to recognize natural speech. I.e., You. Don't. Have. To. Speak. Like. This. To. Be. Understood.

"It's the first step of the new natural speech era," said Dragon Systems CEO James Baker.

Baker likes to tell people that although HAL won't be ready by 2001, the company is on track to create the hyperaware computers of "Star Trek," since the TV series is set 200 years hence.

Over at the Massachusetts Institute of Technology, researchers in the computer sciences department are more interested in dialogue than dictation. Victor Zue , associate director of the MIT Laborator for Computer Science and a speech recognition pioneer, has been as intent in monitoring reactions to innovations as in developing the latest gadget.

"We want to know how people behave when confronted with a new technology," said Zue.

Zue believes research has chipped away at the three major barriers in recognizing human speech: large vocabularies, continuous speech patterns and speakers with different accents or pronunciations.

He demonstrates. Picking up a telephone, Zue calls Jupiter, a weather information program run by MIT's intuitive, conversation interface system called Galaxy. Speaking as he would to an operator, he asks, "What will the weather be like in Boston tomorrow?" Jupiter, in a tinny voice, rattles off a forecast. "What cities do you know in China?" Jupiter recites them. "What is the weather tomorrow in Beijing?" Jupiter tells him, adding politely, "Is there something else you'd like to know?"

Such response is startling at first; you wait for the guy to emerge from behind the curtain. But the call-in service (check for the number) was created not as a lark but to record how people react to Jupiter.

A surprising number, Zue said, are very polite, thanking him/her/it for information. Many ask, "What's your name?" (which the computer can answer). At least five have asked, "What is the meaning of life?" (which Jupiter can't answer).

The ultimate goal, Zue said, is to design computers around human needs, rather than attempt to force humans to adapt. (As anyone will agree who has tried to program a VCR and found herself wishing she could just say, "Record E.R. tonight.")

Computers that take dictation may be effective for physicians, lawyers and corporate chiefs, but employees in today's cubicle-baded workplace might prefer not be overheard. Moreover, said Zue, even a 95 percent accuracy rate means there's one mistake in 20 words. The average length of a Wall Street Journal sentence is 22 words. Even MIT's Jupiter gets confused, mistaking the word "tomorrow" for "Idaho" or "Pittsfield" for "Italy."

The real goal is the creation of "machines that can converse with you and help your solve problems," Zue said.

Can you hear what I hear?

Like most sentient beings, computers loaded with speech recognition software have a learning curve. The more you use it (and correct it), the better it works.

While that process goes on, the mistakes can be frequent - and amusing.

After loading IBM's ViaVoice on one of our PCs, we practiced dictating some familiar phrases. Here's what the computer thought we said: (Note: we didn't complete all of the advanced training, which would have reduced mistakes):

Romeo, Romeo, where Ford are thou Romeo use?

To beat or not to be that is the question.

I think that I shall never see a poll as lovely as 83.

This land is your land, this land is smiling and, from California to the New York violence.

And, finally, several versions of the famous Star Trek slogan:

Beat me up study. Peter me up the spotty. Beam me up, study. Dean me up, Scott T.

32 Vassar Street
Cambridge, MA 02139 USA
(+1) 617.253.3049

©2016, Spoken Language Systems Group. All rights reserved.

About SLS
---Our Technologies
Research Initiatives
---Research Summary
News and Events
---News Articles
SLS People
---Research Staff
---Post-Doctoral Students
---Administrative Staff
---Support Staff
---Graduate Students
---Undergraduate Students
---Positions with SLS
Contact Us
---Positions with SLS
---Visitor Information