Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges

Publication date

2020-10-25

Authors

Markitantov, Maxim
Dresvyanskiy, Denis
Mamontov, Danila
Kaya, HeysemORCID 0000-0001-7947-5508ISNI 000000049289651X
Minker, Wolfgang
Karpov, Alexey

Editors

Advisors

Supervisors

Document Type

Contribution to conference

License

Abstract

This paper describes deep learning approaches for the Mask and Breathing Sub-Challenges (SCs), which are addressed by the INTERSPEECH 2020 Computational Paralinguistics Challenge. Motivated by outstanding performance of state-of-the-art end-to-end (E2E) approaches, we explore and compare effectiveness of different deep Convolutional Neural Network (CNN) architectures on raw data, log Mel-spectrograms, and Mel-Frequency Cepstral Coefficients. We apply a transfer learning approach to improve model’s efficiency and convergence speed. In the Mask SC, we conduct experiments with several pretrained CNN architectures on log-Mel spectrograms, as well as Support Vector Machines on baseline features. For the Breathing SC, we propose an ensemble deep learning system that exploits E2E learning and sequence prediction. The E2E model is based on 1D CNN operating on raw speech signals and is coupled with Long Short-Term Memory layers for sequence modeling. The second model works with log-Mel features and is based on a pretrained 2D CNN model stacked to Gated Recurrent Unit layers. To increase performance of our models in both SCs, we use ensembles of the best deep neural models obtained from N-fold cross-validation on combined challenge training and development datasets. Our results markedly outperform the challenge test set baselines in both SCs.

Keywords

machine learning, computational paralinguistics, speech processing, breathing prediction, information fusion, neural networks, transfer learning, end-to-end models

Citation

Markitantov, M, Dresvyanskiy, D, Mamontov, D, Kaya, H, Minker, W & Karpov, A 2020, 'Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges', Paper presented at INTERSPEECH 2020, Shanghai, China, 25/10/20 - 29/10/20 pp. 2072-2076. https://doi.org/10.21437/Interspeech.2020-2666, conference