SLS :: News and Events

The Galaxy's guide to the hitch-hiker: voice recognition.
(Galaxy computer language recognition system)

05/11/96
The Economist
Page 77
COPYRIGHT 1996 Economist Newspaper Ltd. (UK)

IN FILMS about the future, computers generally feature either as electronic personifications of evil, or as your plastic pal who's fun to be with. But whichever guise they appear in, they differ from the cussed, keyboard-driven creatures familiar to contemporary man in two important ways. They can listen. And they can talk.

Talking is not too difficult--speech-synthesis programs have been around for a while, though computers are hardly great conversationalists. Listening and, more importantly, understanding, though, is still mostly science fiction. But the desire to design a listening machine is strong in software engineers. So strong that they have been at it for several decades, with what might generously be described as limited success.

The problems are legion. The computer must first recognise each word in a sentence, even when spoken in a sloppy, everyday cadence. It must parse the sentence into its grammatical elements. It must then try to understand the meaning of what it has parsed, and finally it must act on it.

This is a formidable task, particularly for the limited brains of the personal computers (PCs) on which a useful system would have to run. But on top of all this, there is the problem of context. Critics of a philosophical bent argue that language is a social activity requiring shared background assumptions. Since those assumptions can never be completely and objectively listed, they cannot be programmed into a computer.

The Spoken Language Systems Group at the Massachusetts Institute of Technology Laboratory for Computer Science, which is led by Victor Zue, thinks, however, that it has gone some way to dealing with these difficulties. It has done so by removing most of the processing problem from the PCs to powerful central machines that can do some serious word-crunching, and by dodging the issue of context.

Dr Zue's systems restrict context by operating in limited "knowledge domains". Two prototypes that have been running for the past few years are extremely limited. Voyager is a user's guide to Cambridge, helping people to find their way to Chinese restaurants and other tourist necessities. And Pegasus is connected to the American Airlines EaasySABRE reservation system. But a third system, Galaxy, which Dr Zue is in the process of developing, manages to have it both ways. By operating in a bigger virtual world--the World Wide Web of the Internet--it can link disparate knowledge domains together in a way that is invisible to the user.

Galaxy's goal is to do for the Internet what graphical icons did for personal computers--give it an intuitive feel that anyone can grasp. A person asking Galaxy for help does not need to know who created the information sought, which computer it sits on, its address on the network, or even its "transmission protocol" (ie, which acronym or jokey mnemonic--such as ftp, gopher or http--is needed to fetch it). He merely asks, "What will the weather be like tomorrow in Manhattan?" and, with luck, the answer will come back in a few seconds (though not necessarily any more accurately than if he had got it from a human weather forecaster). And if Galaxy does not understand the first time, it can ask supplementary questions, like an interviewer trying to pin down the ambiguities of a politician.

Galaxy's manifestation in a user's PC is a small program known as a client. This serves merely as a courier, shuttling the user's questions to a piece of heavyweight silicon known as a language server, and then taking the server's reply and delivering it back to the PC's user. The language servers do the computationally intensive work of speech recognition and language processing. Each also knows how to interrogate the relevant "knowledge server" for the question at hand. These machines find the data a language server needs to compose answers for the original PC's client program to turn into synthesised speech (or a typed text for those unfortunates still chained to a screen). Making all this possible requires an enormous amount of programming.

The basic research that Dr Zue's team has had to carry out in order to write Galaxy's programs includes modelling how people hear, what linguistic regularities pop up in their spontaneous speech (which may, in this context, be English, Spanish, Chinese or Japanese), and how to deal with such problems as foreign accents. For only after having understood the question (Galaxy knows more than 4,000 English words, and dealing with unknown ones was yet another wrinkle that Dr Zue's team had to iron out), can the server then seek an answer from the appropriate knowledge domain.

Ultimately, Dr Zue hopes to eliminate the PC altogether, and have a system that answers questions down a telephone. Whether other people will create enough knowledge-domain servers out there to provide the answers, and whether they will be fun to be with, remains to be seen.