The Galaxy's guide to the hitch-hiker: voice recognition.
(Galaxy computer language recognition system)
COPYRIGHT 1996 Economist Newspaper Ltd. (UK)
IN FILMS about the future, computers generally feature either as electronic personifications
of evil, or as your plastic pal who's fun to be with. But whichever guise they appear in, they
differ from the cussed, keyboard-driven creatures familiar to contemporary man in two important
ways. They can listen. And they can talk.
Talking is not too difficult--speech-synthesis programs have been around for a while, though
computers are hardly great conversationalists. Listening and, more importantly, understanding,
though, is still mostly science fiction. But the desire to design a listening machine is strong
in software engineers. So strong that they have been at it for several decades, with what might
generously be described as limited success.
The problems are legion. The computer must first recognise each word in a sentence, even when spoken
in a sloppy, everyday cadence. It must parse the sentence into its grammatical elements. It must then
try to understand the meaning of what it has parsed, and finally it must act on it.
This is a formidable task, particularly for the limited brains of the personal computers (PCs)
on which a useful system would have to run. But on top of all this, there is the problem of
context. Critics of a philosophical bent argue that language is a social activity requiring
shared background assumptions. Since those assumptions can never be completely and objectively
listed, they cannot be programmed into a computer.
The Spoken Language Systems Group at the Massachusetts Institute of Technology Laboratory for
Computer Science, which is led by Victor Zue, thinks, however, that it has gone some way to
dealing with these difficulties. It has done so by removing most of the processing problem from
the PCs to powerful central machines that can do some serious word-crunching, and by dodging the
issue of context.
Dr Zue's systems restrict context by operating in limited "knowledge domains". Two prototypes that
have been running for the past few years are extremely limited. Voyager is a user's guide to Cambridge,
helping people to find their way to Chinese restaurants and other tourist necessities. And Pegasus is
connected to the American Airlines EaasySABRE reservation system. But a third system, Galaxy, which
Dr Zue is in the process of developing, manages to have it both ways. By operating in a bigger virtual
world--the World Wide Web of the Internet--it can link disparate knowledge domains together in a way
that is invisible to the user.
Galaxy's goal is to do for the Internet what graphical icons did for personal computers--give it an
intuitive feel that anyone can grasp. A person asking Galaxy for help does not need to know who created
the information sought, which computer it sits on, its address on the network, or even its "transmission
protocol" (ie, which acronym or jokey mnemonic--such as ftp, gopher or http--is needed to fetch it).
He merely asks, "What will the weather be like tomorrow in Manhattan?" and, with luck, the answer will
come back in a few seconds (though not necessarily any more accurately than if he had got it from a
human weather forecaster). And if Galaxy does not understand the first time, it can ask supplementary
questions, like an interviewer trying to pin down the ambiguities of a politician.
Galaxy's manifestation in a user's PC is a small program known as a client. This serves merely as a
courier, shuttling the user's questions to a piece of heavyweight silicon known as a language server,
and then taking the server's reply and delivering it back to the PC's user. The language servers do the
computationally intensive work of speech recognition and language processing. Each also knows how to
interrogate the relevant "knowledge server" for the question at hand. These machines find the data a
language server needs to compose answers for the original PC's client program to turn into synthesised
speech (or a typed text for those unfortunates still chained to a screen). Making all this possible
requires an enormous amount of programming.
The basic research that Dr Zue's team has had to carry out in order to write Galaxy's programs
includes modelling how people hear, what linguistic regularities pop up in their spontaneous speech
(which may, in this context, be English, Spanish, Chinese or Japanese), and how to deal with such
problems as foreign accents. For only after having understood the question (Galaxy knows more than
4,000 English words, and dealing with unknown ones was yet another wrinkle that Dr Zue's team had
to iron out), can the server then seek an answer from the appropriate knowledge domain.
Ultimately, Dr Zue hopes to eliminate the PC altogether, and have a system that answers questions
down a telephone. Whether other people will create enough knowledge-domain servers out there to
provide the answers, and whether they will be fun to be with, remains to be seen.