A content selection component determines which information should be
conveyed in the output of a natural language generation system. We present
an efficient method for automatically learning content selection rules from
a corpus and its related database. Our modeling framework treats content
selection as a collective classification problem, thus allowing us to
capture contextual dependencies between input items. Experiments in a
sports domain demonstrate that this approach achieves a substantial
improvement over context-agnostic methods.
The source code for this work can be downloaded from the link below.