Spoken Language Systems
MIT Computer Science and Artificial Intelligence Laboratory


Web-based Information Access

Although audio-only scenarios (e.g., telephone) can clearly benefit from spoken language technology, multimedia environments offer many opportunities as well. A table, map, or other visual image can often succinctly convey information to the user. There are a wide range of scenarios where speech and displays could function effectively together, including handheld devices, kiosks, vehicles, TVs etc. As it becomes as pervasive as the telephone, the web becomes an excellent medium to explore multimedia architectures. Web-based deployment allows us to target any number of different devices using a single architecture, yet make our prototypes available to a wide user population.

In our current web-based interfaces, we use conventional browsers as the basis for our GUI. We currently use a Java applet to transport audio, but are looking into other possibilities, including various VoIP methods. An adapted layout of the web page is currently under development which is optomized to work on smaller displays; it is currently being tested in an automotive settings.

One of our most advanced web-based prototypes is called the City Browser. It has a wide set of capabilities which make it useful both for obtaining urban information and for navigation. The interface, presented in a web browser, is map-centric, utilizing a map generated via the Google Maps API. Speech is transmitted to the server via a Java applet embedded in the page. All speech and language processing takes place using the GALAXY framework. The graphical user interface, accessible by a standard web browser, is rendered using HTML. AJAX (Asynchronous Javascript And XML) techniques enable dynamic updates.

City Browser is scalable, and currently has knowledge of 10 major U.S. metropolitan areas, typically encompassing 20-50 miles around a large city. The following is an example interaction:

CB: Welcome to City Browser
User: Show me 32 Vassar Street in Cambridge.
CB: Here is 32 Vassar Street in Cambridge. [Shown on map]
User: Are there any Indian restaurants near here?
CB: There are 10 Indian restaurants near 32 Vassar Street in Cambridge. [Displays results]
User: What is the nearest subway stop to India Pavilion?
CB: Central square station is located at Massachusetts Avenue and Western Avenue, Prospect Street, and Magazine Street. It is on the Red Line.
User: Are there any museums in this area? [Circles an area of the map]
CB: The Museum of Fine Arts is located at 465 Huntington Avenue in Boston.
User: How do I get to there from 77 Massachusetts Avenue in Cambridge?
CB: Here are driving directions from 77 Massachusetts Avenue in Cambridge to the Museum of Fine Arts. [Gives link to directions]

The video below shows some of the current capabilities of the system.

Further Reading

A. Gruenstein and S. Seneff, "Releasing a Multimodal Dialogue System into the Wild: User Support Mechanisms", Proc. SIGdial Workshop on Discourse and Dialogue, Antwerp, Belgium, September 2007. (PDF)

A. Gruenstein and S. Seneff, "Context-Sensitive Language Modeling for Large Sets of Proper Nouns in Multimodal Dialogue Systems," Proc. IEEE/ACL Workshop on Spoken Language Technology, Palm Beach, Aruba, December 2006. (PDF)

A. Gruenstein, S. Seneff, and C. Wang, "Scalable and Portable Web-Based Multimodal Dialogue Interaction with Geographical Database," Proc. Interspeech, Pittsburgh, Pennsylvania, September 2006. (PDF)

32 Vassar Street
Cambridge, MA 02139 USA
(+1) 617.253.3049

©2016, Spoken Language Systems Group. All rights reserved.

About SLS
---Our Technologies
Research Initiatives
---Research Summary
News and Events
---News Articles
SLS People
---Research Staff
---Post-Doctoral Students
---Administrative Staff
---Support Staff
---Graduate Students
---Undergraduate Students
---Positions with SLS
Contact Us
---Positions with SLS
---Visitor Information