Web-based Information Access
Although audio-only scenarios (e.g., telephone) can clearly benefit
from spoken language technology, multimedia environments offer many
opportunities as well. A table, map, or other visual image can often
succinctly convey information to the user. There are a wide range of
scenarios where speech and displays could function effectively
together, including handheld devices, kiosks, vehicles, TVs etc. As
it becomes as pervasive as the telephone, the web becomes an excellent
medium to explore multimedia architectures. Web-based deployment
allows us to target any number of different devices using a single
architecture, yet make our prototypes available to a wide user
In our current web-based interfaces, we use conventional browsers as
the basis for our GUI. We currently use a Java applet to transport
audio, but are looking into other possibilities, including various
VoIP methods. An adapted layout of the web page is currently under
development which is optomized to work on smaller displays; it is
currently being tested in an automotive settings.
One of our most advanced web-based prototypes is called the City
Browser. It has a wide set of capabilities which make it useful
both for obtaining urban information and for navigation. The
interface, presented in a web browser, is map-centric, utilizing a map
generated via the Google Maps API. Speech is transmitted to the
server via a Java applet embedded in the page. All speech and
language processing takes place using the GALAXY framework. The
graphical user interface, accessible by a standard web browser, is
techniques enable dynamic updates.
City Browser is scalable, and currently has knowledge of 10
major U.S. metropolitan areas, typically encompassing 20-50 miles
around a large city. The following is an example interaction:
CB: Welcome to City Browser
User: Show me 32 Vassar Street in Cambridge.
CB: Here is 32 Vassar Street in Cambridge.
[Shown on map]
User: Are there any Indian restaurants near here?
CB: There are 10 Indian restaurants near 32 Vassar Street in Cambridge. [Displays results]
User: What is the nearest subway stop to India Pavilion?
CB: Central square station is located at Massachusetts Avenue and Western Avenue, Prospect Street, and Magazine Street. It is on the Red Line.
User: Are there any museums in this area? [Circles an area of the map]
CB: The Museum of Fine Arts is located at 465 Huntington Avenue in Boston.
User: How do I get to there from 77 Massachusetts Avenue in Cambridge?
CB: Here are driving directions from 77 Massachusetts Avenue in Cambridge to the Museum of Fine Arts. [Gives link to directions]
The video below shows some of the current capabilities of the
A. Gruenstein and S. Seneff, "Releasing a Multimodal Dialogue
System into the Wild: User Support Mechanisms", Proc. SIGdial Workshop
on Discourse and Dialogue, Antwerp, Belgium, September 2007. (PDF)
A. Gruenstein and S. Seneff, "Context-Sensitive Language Modeling
for Large Sets of Proper Nouns in Multimodal Dialogue Systems,"
Proc. IEEE/ACL Workshop on Spoken Language Technology, Palm
Beach, Aruba, December 2006. (PDF)
A. Gruenstein, S. Seneff, and C. Wang, "Scalable and Portable Web-Based Multimodal Dialogue Interaction with Geographical Database," Proc. Interspeech, Pittsburgh, Pennsylvania, September 2006.