The Crowdsourced Language Assessment Corpus

Overview

The Crowdsourced Language Assessment Corpus (CLAC) consists of audio recordings and automatically-generated transcripts from 1,832 speakers for several speech and language tasks, as well as metadata for each of the speakers. For a description of the corpus, see:

Haulcy, R., Glass, J. (2021) CLAC: A Speech Corpus of Healthy English Speakers. Proc. Interspeech 2021, 2966-2970, doi: 10.21437/Interspeech.2021-1810 (PDF)

If you use this data in your own publications, please cite the paper above.

This data is distributed under the Creative Commons Attribution-ShareAlike (CC BY-SA) license (link).