Extracting Paraphrases from a Parallel Corpus

Abstract

While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. Our approach yields phrasal and single word lexical paraphrases as well as syntactic paraphrases.

Code

The source code for this work can be downloaded from the link below.

Source code

Extracting Paraphrases from a Parallel Corpus

Regina Barzilay, Kathleen McKeown

Abstract

Code