Voice lessons - Computers are beginning to grasp the spoken word
A lot of us talk to our computers. And mostly, the computers don't understand us.
Which is, probably, a good thing - since we're generally grousing at the machine for something or other.
Still, the time when computers will understand us (and, let's hope, politely ignore any
rudeness) is not that far distant. Enhanced speech recognition software programs are not only
storming the market, but are going on sale at prices well within the reach of those of us not
inclined to buy every new computer gadget.
We're still a long way from HAL, the suavely sincere (and sinister) megamachine of "2001:
A Space Odyssey," but we've taken the first major steps in that direction.
Last month, IBM released ViaVoice for Microsoft Windows 95 and Windows NT, a program that can turn
conversational dictation into a document for $99. After loading the software, users begin "training"
the machine with a three-sentence session. They may read 265 more sentences to improve recognition to
ViaVoice joined Dragon Naturally Speaking, released earlier by industry leader Dragon Systems of
Newton at $695. The price has since dropped to $349. Dragon Systems was among the first companies to
produce what is the breakthrough aspect of newer speech recognition programs, that is, the capacity to
recognize natural speech. I.e., You. Don't. Have. To. Speak. Like. This. To. Be. Understood.
"It's the first step of the new natural speech era," said Dragon Systems CEO James Baker.
Baker likes to tell people that although HAL won't be ready by 2001, the company is on track to
create the hyperaware computers of "Star Trek," since the TV series is set 200 years hence.
Over at the Massachusetts Institute of Technology, researchers in the computer sciences department
are more interested in dialogue than dictation. Victor Zue , associate director of the MIT Laborator
for Computer Science and a speech recognition pioneer, has been as intent in monitoring reactions to
innovations as in developing the latest gadget.
"We want to know how people behave when confronted with a new technology," said Zue.
Zue believes research has chipped away at the three major barriers in recognizing human speech:
large vocabularies, continuous speech patterns and speakers with different accents or pronunciations.
He demonstrates. Picking up a telephone, Zue calls Jupiter, a weather information program run by
MIT's intuitive, conversation interface system called Galaxy. Speaking as he would to an operator, he asks, "What will the weather be like in Boston tomorrow?" Jupiter, in a tinny voice, rattles off a forecast. "What cities do you know in China?" Jupiter recites them. "What is the weather tomorrow in Beijing?" Jupiter tells him, adding politely, "Is there something else you'd like to know?"
Such response is startling at first; you wait for the guy to emerge from behind the curtain. But the
call-in service (check www.slc.lcs.mit.edu/jupiter for the number) was created not as a lark but to
record how people react to Jupiter.
A surprising number, Zue said, are very polite, thanking him/her/it for information. Many ask,
"What's your name?" (which the computer can answer). At least five have asked, "What is the meaning
of life?" (which Jupiter can't answer).
The ultimate goal, Zue said, is to design computers around human needs, rather than attempt to
force humans to adapt. (As anyone will agree who has tried to program a VCR and found herself
wishing she could just say, "Record E.R. tonight.")
Computers that take dictation may be effective for physicians, lawyers and corporate chiefs, but
employees in today's cubicle-baded workplace might prefer not be overheard. Moreover, said Zue,
even a 95 percent accuracy rate means there's one mistake in 20 words. The average length of a
Wall Street Journal sentence is 22 words. Even MIT's Jupiter gets confused, mistaking the word
"tomorrow" for "Idaho" or "Pittsfield" for "Italy."
The real goal is the creation of "machines that can converse with you and help your solve problems,"
Can you hear what I hear?
Like most sentient beings, computers loaded with speech recognition software have a learning curve.
The more you use it (and correct it), the better it works.
While that process goes on, the mistakes can be frequent - and amusing.
After loading IBM's ViaVoice on one of our PCs, we practiced dictating some familiar phrases.
Here's what the computer thought we said: (Note: we didn't complete all of the advanced training,
which would have reduced mistakes):
Romeo, Romeo, where Ford are thou Romeo use?
To beat or not to be that is the question.
I think that I shall never see a poll as lovely as 83.
This land is your land, this land is smiling and, from California to the New York violence.
And, finally, several versions of the famous Star Trek slogan:
Beat me up study. Peter me up the spotty. Beam me up, study. Dean me up, Scott T.