Audio Bigrams as a Unifying Model of Pitch-based Song Description

Publication date

2015

Authors

Van Balen, J.M.H.ISNI 0000000419527523
Wiering, F.ORCID 0000-0002-2984-8932ISNI 0000000053360131
Veltkamp, Remco CISNI 0000000109665680

Editors

Advisors

Supervisors

DOI

Document Type

Contribution to conference
Open Access logo

License

Abstract

In this paper we provide a novel perspective on a family of music description algorithms that perform what could be referred to as `soft' audio fingerprinting. These algorithms convert fragments of musical audio to one or more fixed-size vectors that can be used in distance computation and indexing, not just for traditional audio fingerprinting applications, but also for retrieval of cover songs from a large collection, and corpus-level description of music. We begin with a high-level overview of the algorithms. Next, we identify and formalize an underlying paradigm that allows us to see them as variations of the same model. Finally, we present pytch, a Python implementation of the model that accommodates several of the reviewed algorithms and allows for a variety of applications. The implementation is available online and open to extensions and contributions.

Keywords

Audio fingerprinting, Cover detection, Convolutional neural networks

Citation

Van Balen, J, Wiering, F & Veltkamp, R 2015, 'Audio Bigrams as a Unifying Model of Pitch-based Song Description', Paper presented at 11th International Symposium on Computer Music Multidisciplinary Research (CMMR), Plymouth, United Kingdom, 16/06/15 - 19/06/15., conference