Emotion based segmentation of musical audio
Publication date
2015
Editors
Advisors
Supervisors
Document Type
Part of book
Metadata
Show full item recordCollections
License
cc_by
Abstract
The dominant approach to musical emotion variation detection tracks emotion over time continuously and usually deals with time resolutions of one second. In this paper we discuss the problems associated with this approach and propose to move to bigger time resolutions when tracking emotion over time. We argue that it is more natural from the listener’s point of view to regard emotional variation in music as a progression of emotionally stable segments. In order to enable such tracking of emotion over time it is necessary to segment music at the emotional boundaries. To address this problem we conduct a formal evaluation of different segmentation methods as applied to a task of emotional boundary detection. We collect emotional boundary annotations from three annotators for 52 musical pieces from the RWC music collection that already have structural annotations from the SALAMI dataset. We investigate how well structural segmentation explains emotional segmentation and find that there is a large overlap, though about a quarter of emotional boundaries do not coincide with structural ones. We also study inter-annotator agreement on emotional segmentation. Lastly, we evaluate different unsupervised segmentation methods when applied to emotional boundary detection and find that, in terms of F-measure, the Structural Features method performs best.
Keywords
Citation
Aljanaki, A, Wiering, F & Veltkamp, R C 2015, Emotion based segmentation of musical audio. in Proceedings of the 16th Conference of the International Society for Music Information Retrieval (ISMIR 2015). pp. 770-776. https://doi.org/10.5281/zenodo.1418201