Speech Emotion Recognition using Deep Convolutional Neural Networks improved by the fast Continuous Wavelet Transform

Van Zwol, BE; Langezaal, MA; Arts, LPA; Gatt, A; Van den Broek, EL

doi:https://doi.org/10.3233/AISE230012

Speech Emotion Recognition using Deep Convolutional Neural Networks improved by the fast Continuous Wavelet Transform

Files

AISE-32-AISE230012.pdf (1.13 MB)

Publication date

2023

Authors

Zwol, B. E. van

Langezaal, Mathijs A

Arts, Lukas Petrus Anthonius

Gatt, Albert

van den Broek, E.L.

Editors

Bekaroo, Girish

Ben Allouch, Somaya

Mecella, Massimo

DOI

https://doi.org/10.3233/AISE230012

Document Type

Part of book

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by_nc

Abstract

The fast Continuous Wavelet Transform (fCWT) is used to improve Deep Convolutional Neural Networks (DCNN)’s Speech Emotion Recognition (SER). While being computationally efficient, the fCWT’s time-frequency analysis overcomes traditional methods’ resolution limitations (e.g., Short-Term Fourier Transform). fCWT-induced DCNNs are compared to state-of-the-art DCNN SER systems. Comparing different wavelet parameters, we also provide an empirical strategy for balancing temporal and spectral features in speech signals. We suggest that this strategy is of generic interest for non-stationary signal processing where large amounts of data are available. fCWT’s potential for improving SER accuracy in real-time applications is confirmed. In parallel, the variance in the cross-validation folds confirmed deep learning’s vulnerability on non-big data sets.

Keywords

Deep Learning, Deep Convolutional Neural Networks, Signal Processing, Continuous Wavelet Transform, fCWT, Speech Emotion Recognition

Citation

Van Zwol, BE, Langezaal, MA, Arts, LPA, Gatt, A & Van den Broek, EL 2023, Speech Emotion Recognition using Deep Convolutional Neural Networks improved by the fast Continuous Wavelet Transform. in G Bekaroo, S Ben Allouch & M Mecella (eds), Workshop Proceedings of the 19th International Conference on Intelligent Environments (IE2023). Ambient Intelligence and Smart Environments, vol. 32, IOS Press, pp. 63-72. https://doi.org/10.3233/AISE230012

URI

https://dspace.library.uu.nl/handle/1874/430376

Speech Emotion Recognition using Deep Convolutional Neural Networks improved by the fast Continuous Wavelet Transform

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI