[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

user interface issues




SFC folks,

this morning i logged about fifty example utterances
and ten example queries for the SFC, using joe et al.'s
ipaq setup (thanks).

that got me thinking about the user interface that
will mediate all of this.  here are some comments
meant to spark further discussion.

* it's a pain to have to indicate the start and
end of each utterance.  let's think about how to 
delimit a continuous speech stream, and how to bind
portions of the stream to the documents that the
user is indicating.  

ideally, this binding could happen in real time,
with some kind of visual feedback to the user; for
example, a low-res copy of the document, rectified
to be upright, shown on the SFC display with a 
highlighted portion of the text stream shown next
to it.  

* we should think about the actual machinery we'll
use for input.  maybe we don't need a sheet-fed
scanner.  maybe we can get away with a high-res
color camera, mounted so as to point down onto a
scanning area.  a 3K x 2K camera scanning an 11 x 8 1/2"
document will have something like 250 dots per inch
effective resolution, which may be enough for our
purposes.  

this would shift some work onto the user -- to place
each document onto the scan area, and wait for the
camera to grab it -- but if the camera and system
were fast enough, the SFC could indicate when it
had "grabbed" the document and the user could go on
to the next.

* utterances can have temporal aspects:  "this is
the receipt from yesterday's food shopping."  or
"here's a coupon, remind me to use it before it
expires."  so temporal context has to be in our
understanding model.

* many documents have an on-line source or analogue.
example:  we print something from a browser in order
to carry it around and use it in the world.  at home
we want to file it, but don't want the trouble of
finding it again on-line.  (or, it may be ephemeral
by nature, like a confirmation slip.)  so as part
of the SFC "back end," links from the document (as
embedded text URLs) should be stored, as should links
*to* the document, if they can be inferred.

another example came from my papers:  a honda paper
directive asking me to register my car VIN online,
for email updates.  it would be cool if a software
"agent" could take action based on the instructions
(and my imperative utterance).


thoughts about the query process:

* we should retain low-res images of everything,
for fast thumbnail display and browsing on the SFC
display (perhaps a touchscreen?).  we have efficient
visual memory, and can efficiently recognize (as
opposed to recall) things from their appearances.
perhaps the SFC display could bring up many docs
that fit the partial query, and we can sharpen
the query by adding more, restrictive words ("no,
just the coupons for _x_") or simply by touching
the screen.

well, this was a bit rambling, but i hope it is 
useful...

seth.