|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.mit.nlp.segmenter.SegTester
public class SegTester
The purpose of this class is to provide a unified framework to evaluate and run various segmenters.
SegTester -config config -dir dir -suff suff [-init init] [-debug]
config the configuration file for the experiment (see the config directory)
dir the directory where the data files are located
suff the suffix of the data files
init for initializable segmenters (e.g. CuCoSeg
),
this specifies the name of a file with the initial segmentations.
debug print debugging info
Outputs: the configuration, the files that it's reading in, anything the segmenter itself wants to say,
and the pk/wd per file.
cat file | SegTester -config config [-debug debug] [-num-segs num-segs] config the configuration file for the experiment (see the config directory) debug print debugging info num-segs number of segments desired. if not provided, will be read from the file itself, unless the configuration specifies that the number of segments is unknownOutputs: the configuration, the line numbers of the segment endpoints
Field Summary | |
---|---|
protected static String |
para_ending
|
protected MyTextWrapper[] |
texts
|
Constructor Summary | |
---|---|
SegTester(ml.options.OptionSet optset)
|
Method Summary | |
---|---|
void |
eval(Segmenter segmenter)
Evaluate a segmenter. |
static ParaData |
getParaData(String filename)
gets "paralinguistic" data, e.g. pause durations and prosodic markers. |
protected void |
loadFiles(ml.options.OptionSet optset)
|
MyTextWrapper |
loadText(String fileName)
|
static void |
main(String[] args)
|
static void |
preprocessText(MyTextWrapper text,
boolean use_choi,
boolean is_windowing_enabled,
boolean remove_stops,
boolean use_stems,
int window_size)
does some preprocessing stuff on the text -- stemming, removing stop words, handling segment boundries, and breaking the text into K-word blocks. |
protected static List |
stemStopWords(List stopWords)
if we're doing stemming, then we need to also stem the stopwords (otherwise they won't match) This does that. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected MyTextWrapper[] texts
protected static String para_ending
Constructor Detail |
---|
public SegTester(ml.options.OptionSet optset) throws Exception
Exception
Method Detail |
---|
public static void main(String[] args)
protected void loadFiles(ml.options.OptionSet optset)
public MyTextWrapper loadText(String fileName)
public static ParaData getParaData(String filename)
public static void preprocessText(MyTextWrapper text, boolean use_choi, boolean is_windowing_enabled, boolean remove_stops, boolean use_stems, int window_size)
text
- the text file to preprocessuse_choi
- use choi-style segment boundariesis_windowing_enabled
- whether to break the text into fixed-length chunks (as opposed to using sentence breaks)window_size
- the size of the fixed-length chunksremove_stops
- whether to remove stopwordsuse_stems
- whether to use stemmingprotected static List stemStopWords(List stopWords)
public void eval(Segmenter segmenter)
segmenter
- the segmenter class that we're evaluating
Doesn't return anything, just prints stuff. Uses Malioutov's evaluation code.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |