Locating singing voice segments within music signals

Adam L. Berenzweig and Daniel P.W. Ellis

To appear at Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA01), Mohonk Mountain Resort, NY, 21-24 October 2001


Abstract

A sung vocal line is the prominent feature of much popular music. It would be useful to reliably locate the portions of a musical track during which the vocals are present, both as a `signature' of the piece and as a precursor to automatic recognition of lyrics. Here, we approach this problem by using the acoustic classifier of a speech recognizer as a detector for speech-like sounds. Although singing (including a musical background) is a relatively poor match to an acoustic model trained on normal speech, we propose various statistics of the classifier's output in order to discriminate singing from instrumental accompaniment. A simple HMM allows us to find a best labeling sequence for this uncertain data. On a test set of forty 15 second excerpts of randomly-selected music, our classifier achieved around 80\% classification accuracy at the frame level. The utility of different features, and our plans for eventual lyrics recognition, are discussed.


Server START Conference Manager
Update Time 5 Jul 2001 at 16:08:05
Maintainer malcolm@ieee.org.
Start Conference Manager
Conference Systems