A sung vocal line is the prominent feature of much popular music. It would
be useful to reliably locate the portions of a musical track during which
the vocals are present, both as a `signature' of the piece and as a
precursor to automatic recognition of lyrics. Here, we approach this
problem by using the acoustic classifier of a speech recognizer as a detector
for speech-like sounds. Although singing (including a musical background)
is a relatively poor match to an acoustic model trained on normal
speech, we propose various statistics of the classifier's output in order to
discriminate singing from instrumental accompaniment. A simple HMM allows
us to find a best labeling sequence for this uncertain data. On a test
set of forty 15 second excerpts of randomly-selected music, our
classifier achieved around 80\% classification accuracy at the frame
level. The utility of different features, and our plans for
eventual lyrics recognition, are discussed.