[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: smart file cabinet mockup
- To: Smart Filing Cabinet <sfc@graphics.lcs.mit.edu>
- Subject: Re: smart file cabinet mockup
- From: Seth Teller <teller@lcs.mit.edu>
- Date: Mon, 23 Dec 2002 10:00:26 -0500
- Organization: MIT Computer Graphics Group
- References: <3DF8DD23.67CAB85B@lcs.mit.edu> <3E01F8A3.EE11897C@mit.edu>
- Sender: seth
this is a great idea -- i'd like to be a volunteer.
jim, i have a sony digital voice recorder, if i do
some utterances onto that can i give you guys the
file? otherwise i'm happy to use the ipaq if you
lend me one.
btw, i've created a mailing list for this group,
called
sfc@graphics.lcs.mit.edu
with jim, david, joe, and me as members. please use
it if you remember ... it makes email indexing
possible pre-haystack.
seth.
Jim Glass wrote:
>
> David, Seth,
>
> In creating any new speech-based application I always worry
> about language usage. The technology always works better
> when you have lots of data that reflects how people will
> talk to the eventual system. The challenge is always to get
> the initial data to get the first iteration working. I have
> an idea I want to bounce off of you for the smart file
> cabinet.
>
> we have been using the iPAQ to collect data for our speaker
> ID/face ID experiments. I think we could probably use this
> framework to collect speech data for the smart file cabinet
> too. I would propose that we send people home with the iPAQ
> so they can go through their piles of paper junk and record
> what they might say about each piece. They can also do this
> in their office. We will set up a script to have people
> transcribe what they said when they download it back to our
> file server. Once we have a sizeable corpus we can train up
> an initial system to talk to. There will still be many
> challenges (i.e., out-of-vocabulary words), but the basic
> recognizer should be somewhat competent.
>
> I'll volunteer to get everyone in our group (around 30
> people) to do this. We have multiple iPAQs setup, so we
> think we can collect a lot of data in January if we push.
>
> sound reasonable?
>
> jim
>
> p.s. we could also take pictures of stuff while we're doing
> this, but that would probably make the data collection more
> tedious, and I'm not sure it adds anything.