We consider the problem of modeling the content structure of texts within a
specific domain, in terms of the topics the texts address and the order in
which these topics appear. We first present an effective knowledge-lean
method for learning content models from unannotated documents, utilizing a
novel adaptation of algorithms for Hidden Markov Models. We then apply our
method to two complementary tasks: information ordering and extractive
summarization. Our experiments show that incorporating content models in
these applications yields substantial improvement over
The source code for this work can be downloaded from the link below.
Source code