Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition

Overview

VocalSound is a free dataset consisting of 21,024 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs from 3,365 unique subjects. The VocalSound dataset also contains meta information such as speaker age, gender, native language, country, and health condition.

The VocalSound dataset and the baseline code are available here.

If you use this data in your own publications, please cite our paper.

Y. Gong, J. Yu and J. Glass, "Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022. (PDF)

This data is distributed under the Creative Commons Attribution-ShareAlike (CC BY-SA) license (link).