Random Newsbytes: Next: the conversational interface?: Challenges remain in improving how a computer handles a dialogue

Tony Waltham*

12/10/97
Bangkok Post
Copyright 1997 The Bangkok Post

Computer users have largely shifted from a command line interface to a graphical user interface, using a mouse to move around the screen and clicking on icons to "launch" programs.

Making this possible have been the rapid advances in processing power and the steady drop in the cost of computer memory and storage. Ten years ago, Windows/286 showed the direction, but Windows ran like treacle on 286 machines of that era.

Even the Intel 80386 processor, the newest chip on the block a decade back, did not really have enough power for Windows, and it took the 486-based machines to enable Windows 3.x to really gain its commanding footprint on the desktop.

Similarly, today you can talk to your PC.

Dragon Dictate is one popular PC application, while IBM has good solutions. These systems can be trained to enhance their understanding of a user's commands, but are sensitive to the microphone used and occasionally require a specific sound card.

We are still at the pioneering stage with PC-based voice recognition software: if the user cooperates and is patient, it can usually be "made" to work.

However, Intel and Microsoft have both stated that the "human interface" will drive sales of future generations of microprocessors and of software applications and the continuing miniaturisation that is giving us faster chips and more memory per square inch will enable this.

MIT Computer Science Laboratory's Associate Director Victor Zue explained to delegates at the Fourth Natural Language Processing Pacific Rim Symposium in Phuket last week that this and two other factors were driving what he called the "conversational interface," which he said was "inevitable."

One underlying fundamental is "the human desire to communicate," while the other -- in addition to miniaturisation -- is the increased connectivity and networking that is occurring, such as over the Internet.

He demonstrated live to the audience one such manifestation of the conversational interface, using a telephone handset as the "client". Dialling up MIT in Cambridge, Massachusetts, Prof Zue spoke to a computer called "Jupiter".

He asked for a weather forecast for the Cambridge area. Then he asked it how many cities it knew in Thailand. The response was "I know one city in Thailand ... Bangkok." Prof Zue asked for forecast for Bangkok, and got an immediate response over the telephone from the MIT computer.

This demonstration of conversational access to online information or services clearly shows that it can be done today, although Prof Zue explained that the "expertise" demonstrated was limited to a certain narrow domain of knowledge, that is where both the questions expected and the answers are fairly limited in scope.

Other such "domains" that have been developed today can be to handle other on-line enquiries such as about movies that are showing, restaurants or job opportunities.

Also demonstrated was "virtual browsing" using a conversational interface, whereby information was requested verbally and a computer looked it up on the Web and presented the responses on a computer screen.

Queries could be made regarding the information displayed, and the responses again appeared on the screen in text mode.

There are clear opportunities for such services once the technology -- "where much research remains to be done" -- becomes more sophisticated, and then computers will appear in places that they are not found today.

They will also be accessible with a telephone call and the ability to process or understand words spoken over a telephone, and consequently of lower quality, was important, he said.

The model that MIT foresaw was a distributed one, with servers that each focussed on domains of knowledge which could be added to a system incrementally. The ability to deal with continuous speech and by unknown users was important, while the vocabulary should be 1,000 words or more.

Importantly, processing should be in real time, meaning instantaneously, and the system should operate on standard platforms, he said.

Multiple languages can be catered to, both in terms of the responses as well as for the questions -- and such systems can even be used as a language learning system, Prof Stephanie Seneff, also of the MIT Computer Science Lab, explained.

And, if PC users who dabbled in Microsoft Windows on a 286 machine 10 years ago spent more time looking at the hourglass symbol than they cared to, speech processing and recognition 10 years ago was much further behind.

Prof Seneff recalled how you could input a sentence and go away and make a pot of tea or go off to lunch, as it would take up to 20 minutes for a computer a decade ago to process this information and respond to it.

Challenges remain, and issues include designing a system that can learn new words, since languages are dynamic, as well as in improving how a computer handles a dialogue remain. Such a system should not be completely passive, nor should it be too assertive.

In a conversation such as a telephone enquiry there is a lot of "back channel" or very short dialogue (such as "yes," or "umm", and 80 percent of remarks contain less than 12 words, Prof Zue explained.

It is reassuring that research institutes around the world, including several universities here in Thailand as well as event organisers, the National Electronics and Computer Technology Centre (Nectec), Kasetsart University and the Asian Institute of Technology, are addressing these issues.

And, as such, I can be fairly confident of making the prediction that in 10 years from now we will be talking with computers a lot -- the key question is, if the "conversation" is over a telephone, "will we realise that it is a computer or a real person at the other end or not?"

*Tony Waltham is Editor of Database.