C++ re-implementation
- works with python2.7 and boost 1.54 (updated: Nov 14, 2014)
- performs maximum marginal decoding which improves accuracy
- runs much faster because of C++ implementation
- has training heuristics for huge corpus that doesn't fit into memory
- supports UTF-8 data
- provides options to use learned model on new text without re-training from scratch
(It gives segmentation performance similar to the original code, but does not replicate the output exactly due to differences in random number generator library used.)