Modeling Syntactic Context Improves Morphological Segmentation

Yoong Keok Lee, Aria Haghighi, and Regina Barzilay

In Proceedings of CoNLL-2011

Code

replicates results in paper

works with python2.7 and boost 1.54 (updated: Nov 14, 2014)
performs maximum marginal decoding which improves accuracy
runs much faster because of C++ implementation
has training heuristics for huge corpus that doesn't fit into memory
supports UTF-8 data
provides options to use learned model on new text without re-training from scratch

(It gives segmentation performance similar to the original code, but does not replicate the output exactly due to differences in random number generator library used.)

Modeling Syntactic Context Improves Morphological Segmentation

Yoong Keok Lee, Aria Haghighi, and Regina Barzilay

In Proceedings of CoNLL-2011

Paper

Code