Download 736 740 Zip Site
Five unique human-annotated descriptions for every audio clip.
The dataset is hosted by the and can be accessed through platforms like Zenodo . Download 736 740 zip
Reference the original paper: Drossos, K., Lipping, S., & Virtanen, T. (2020). "Clotho: an Audio Captioning Dataset." Proc. IEEE ICASSP, pp. 736-740 . pp. 736-740 .