SFC log
2002.02.20
SFC overview and initial direction:
- prototype system with stubs
- inputs: audio (speech), video (face, gesture and document), scanner
(document), touch pad, keyboard, mouse
- low-latency display and availability of inputs
- naming and archiving of inputs. granularity? compression?
- what hardware?
- what os? redhat
- how to set it up?
2002.02.26
SFC meeting with Seth Teller, Jim Glass, Joe Polifroni, David Karger,
Fred Strom, and Trevor Darrell.
location
- could possibly use the mutimodal input setup in SLS group (but insufficient space?)
- would be better to make a 3rd multimodal clone or a different setup in 2nd floor hardware bay
hardware
- use system similar to the system used for multimodal input by SLS
- use directional microphone array from SLS
- support firewire and use iBot firewire webcam
- use USB webcam as backup
- use a fast automatic document feeding scanner
critical path
- speech recognition
- document scanning and OCR
- video input: document, gesture, and face
- smart backend: Haystack, text search, and special-purpose agents
speech recognition
- get speech system from SLS; need to run redhat 7.2/7.3 or debian
- start building a training set from utterances
- modify audio box to stream to recognizer based on multimodal cues
- 3 minute chunks of speech will enable SFC to continuously listen and transcribe
- get it going: echo utterances and reimbursement application.
document scanning and OCR
- find OCR sofware -- is there anything open source? (GOCR/JOCR on sourceforge.net, but development not active for about a year)
- Others? OCR Shop for linux (based on OmniPage) costs >$1K w/o educational discount.
video input
- start with simple document focused camera with single iBot or USB webcam input; background subtractoin with best frame or average frame
- look into video for linux resources for capture info
- by summer system for gesture and associated object recognition might be ready to be incorporated into SFC; uses stereo video input; significant cost to set up
- face recognition could be incorporated into the machine -- a demo of face recognition already exists
smart backend
- start with simple text search for associations (perhaps using
Lucene)
- environment for storing data and annotations? initially something very simple -- perhaps just associated annotation file or table entry for each datum, and use simple text search. eventually use Haystack (contacts dquan@ai.mit.edu and dfhuynh@ai.mit.edu).
- start with a specialized, smart agent for each application -- eventually make a more general solution