Abstract
We consider the problem of modeling the content structure of texts within a
specific domain, in terms of the topics the texts address and the order in
which these topics appear. We first present an effective knowledge-lean
method for learning content models from unannotated documents, utilizing a
novel adaptation of algorithms for Hidden Markov Models. We then apply our
method to two complementary tasks: information ordering and extractive
summarization. Our experiments show that incorporating content models in
these applications yields substantial improvement over
previously-proposedmethods.
Code
The source code for this work can be downloaded from the link below.
Source code