WASPAA 2009 - 2009 IEEE Workshop on Applications of
Signal Processing to Audio and Acoustics - October 18 - 21 -
Mohonk Mountain House, New Paltz, New York, U.S.A.

Keynote Talks

Keynote Talk 1: Nonlinear Cochlear Signal Processing and Phoneme Perception

Speaker: Prof. Jont B. Allen, University of Illinois, Urbana-Champaign (with Marion Règnier, Sandeep Phatak, and Feipeng Li)

Abstract: The most important communication signal is human speech. It is helpful to think of speech communication in terms of Claude Shannon's information theory channel model. When thus viewed, it immediately becomes clear that the most complex part of speech communication channel is in auditory system (the receiver). In my opinion, even after years of work, relatively little is known about how the human auditory system decodes speech. Given cochlear damaged, speech scores are greatly reduced,even with tiny amounts of noise. The exact reasons for this SNR-loss presently remain unclear, but I speculate that the source of this must be cochlear outer hair cell temporal processing, not central processing. Specifically, "temporal edge enhancement'' of the speech signal and forward masking could easily be modified in such ears, leading to SNR-Loss. What ever the reason, SNR-Loss is the key problem that needs to be fully researched.

Speaker Biography: Jont B. Allen received his BS in Electrical Engineering from the University of Illinois, Urbana-Champaign, in 1966, and his MS and PhD in Electrical Engineering from the University of Pennsylvania in 1968 and 1970 respectively. After graduation he joined Bell Laboratories, and was in the Acoustics Research Department in Murray Hill, NJ from 1974 to 1996, as a Distinguished Member of Technical Staff. Since 1996 Dr. Allen was a Technology Leader at AT&T Labs-Research. Since Aug. 2003 Allen is a tennured Associate Professor in ECE, at the University of Illinois, and on the research staff of the Beckman Inst., Urbana, IL.

During his 32 year AT&T career Prof. Allen has specialized in cochlear and middle ear modeling and auditory signal processing. In the last 10 years, he has concentrated on the problem of human speech recognition. His expertise spans the areas of signal processing, physical acoustics, acoustic power flow and impedance measurements, cochlear modeling, auditory neurophysiology, auditory psychophysics, and human speech recognition.

Prof. Allen is a Fellow (May 1981) of the Acoustical Society of America (ASA) and Fellow (January 1985) of the Institute of Electrical and Electronic Engineers (IEEE). In 1986 he was awarded the IEEE Acoustics Speech and Signal Processing (ASSP) Society Meritorious Service Award, and in 2000 received an IEEE Third Millennium Medal. He is a past member of the Executive Council of the ASA, the Administration Committee (ADCOM) of the IEEE ASSP, has served as Editor of the ASSP Transactions, as Chairman of the Publication Board of the ASSP Society, as General Chairman of the International Conference on ASSP (ICASSP-1988), and on numerous committees of both the ASA and the ASSP. He is presently a member of ASA Publications Policy Board. He has organized several workshops and conferences on hearing research and signal processing. In 1984 he received funding from NIH to host the 2nd International Mechanics of Hearing Workshop. He has a strong interest in electronic publications and has produced several CDROM publications, including suggesting, and then overseeing technical details of, the publication of the J. Acoust. Soc. Am. in DjVu format, and developed the first subject classification system for the IEEE Transactions of the ASSP, as well as the ASSP Magazine.

In 1986-88 Prof. Allen participated in the development of the AT&T multi-band compression hearing aid, later sold under the ReSound and Danavox name, and served as a member of the ReSound and SoundID Scientific advisory boards. Since 1987 he has been an Adjunct Associate Research Scientist in the Department of Otolaryngology at Columbia University, and on the CUNY speech and Hearing Faculty (Adjunct). In 1990 he was an Osher Fellow at the Exploratorium museum in San Francisco. In 1991-92 he served as an International Distinguished Lecturer for the IEEE Signal Processing Society. In 1993 he served on the Dean's Advisory Council at the University of Pennsylvania. In 1994 Allen spent 5 weeks as Visiting Scientist and Distinguished Lecturer at the University of Calgary. From 1994-present Allen is the CTO of Mimosa Acoustics, which manufactures diagnostic equipment of cochlear and middle ear disorders, based on distortion product and power flow and acoustic impedance measurements. In 2000 he received the IEEE third Millennium Award, and in 2004, an IBM faculty award. Prof. Allen has more than 90 publications (36 peer reviewed) and 16 patents in the areas of speech noise reduction, speech and audio coding, and hearing aids. In 2006 Allen published a 150-page research monograph "Articulation and Intelligibility" that reviews the literature on human speech recognition from 1900 to the present, in the context of models of human speech perception.

Keynote Talk 2: Scalable Audio Coding for Heterogeneous Networks

Speaker: Prof. Bastiaan Kleijn, KTH School of Electrical Engineering, Stockholm

Abstract: The increasingly heterogeneity of communication networks requires audio coders that can adapt instantaneously to both changing rate constraints and changing channels. This talk discusses a methodology aimed at always having the right coder configuration for the scenario at hand.

To facilitate adaptivity, it is advantageous to have statistical models of all components in a coding process: the source, the encoder, the channel, the decoder, and the receiver. The model parameters for the source and channel are estimated and the receiver is described by an auditory model. Given the source, channel, and auditory models for a signal segment, the parameters for the encoder and decoder are computed to maximize the performance perceived by the receiver, while satisfying network and user constraints. The approach implies that analytic relations must replace empirical findings. We briefly introduce high-rate quantization theory, on which derivations of such relations can be based. We then discuss a number of specific techniques. We derive the optimal rate distribution between the signal model and the signal given the model. We design quantizers that allow trading rate variation against distortion variation. We describe a scalable multiple-description coding (MDC) method with an arbitrary number of descriptions that allows the instantaneous computation of the optimal MDC configuration for any packet loss rate. We conclude by describing the architecture and performance of a complete system based on these principles.

Speaker Biography: Bastiaan Kleijn is a Professor at the School of Electrical Engineering at KTH (the Royal Institute of Technology) in Stockholm, Sweden and Head of the Sound and Image Processing Laboratory. He is also a co-founder of Global IP Solutions, where he remains Chief Scientist. He holds a M.S. and Ph.D. in Electrical Engineering rom Stanford and Delft University of Technology, respectively, and a Ph.D. in Soil Science and an M.S. in Physics from the University of California. He worked on speech processing at AT&T Bell Laboratories from 1984 to 1996 and has since worked at KTH. Visiting positions he has held include Delft University of Technology and Vienna University of Technology. He has been and is on the Editorial Board of five journals in the general area of signal processing. He is a Fellow of the IEEE.

Keynote Talk 3: Signal Processing Not Spoken Here: The Human Experience of Aural Space

Speaker: Dr. Barry Blesser, Blesser Associates

Abstract: The human experience of sound and the signal processing of its parameters are often only weakly related, each with its own language and concepts. On the one hand, physical acoustics follows the laws of nature and signal processing follows the rules of mathematics. On the other hand, cultural acoustics depends on the life styles of individuals, who can flexibly select a cognitive strategy based on mood, personality, and situation. These differences illustrate the dichotomy between the phenomenology of sensory experience and the predictive formalism of the scientific method. Specifically, the concepts of causation, quantification, and consistency exist in the language of signal processing but only weakly exist in cultural acoustics.

During one period in our history, spatial acoustics were viewed as undesirable noise; during other periods, spatial acoustics were viewed as sensory texture that gave each space a signature personality. For the first few years after the Boston Symphony Hall was built, it was considered to be a musical disaster. It is now considered to be one of the worlds ten best musical spaces.

Even though the acoustical properties of a space can strongly influence the emotions and behavior of the inhabitants, that influence is not readily measurable in controlled laboratory environments. The Perceptual Uncertainty Principles states that strong phenomena in real life may not be observable in the laboratory, and conversely, consistent laboratory results may not be applicable to real life. The paradigm of every discipline is like a filter that allows some aspects of a phenomenon to be observed while hiding and distorting others. Sensory anthropologists observe human behavior in natural settings, while perceptual scientists observe constrained behavior in sterile environments with very simple stimuli.

Consider a large physical space like a concert hall or cathedral. The physical acoustics of such spaces are so complex that they can never be accurately described because thermal waves make them time-varying. Simulated spaces are only approximations because there are no computers that can process sonic signals in three dimensions at the required temporal and spatial bandwidths of the human auditory system. Creating spatial illusions is possible but such illusions are often not consistent among individuals and sound sources. Moreover, illusions take advantage of evolutionary artifacts of our brain substrates, but there is no formal method for discovering such artifacts. For example, in-head localization of a sound source when listening with stereophonic headphones is such an example of an artifact. The perceptual difference between the decay of a guitar string and the reverberation of a concert hall is also a similar artifact. Artificial reverberation is actually an aural illusion of a space.

Modern neurophysiology research has clearly demonstrated that our brain retains its plasticity forever. We are how we live. A music conductor has better localization ability than the average individual. Those with weak vision can form an image of a space by listening. Spectral artifacts of a signal processing algorithm may initially be inaudible but become very loud as a user spends more time listening. Musical enthusiasts who attend classical music in real concert halls may become very sensitive to the aural influence of seat selection, while those that listen by headphones to modern music may enjoy the fact that the simulated space are artistic contradictions that could never be built.

Speaker Biography: Barry Blesser received his S.B., S.M. and Ph. D. degrees in electrical engineering from M.I.T. in 1964, 1965, and 1969 respectively, specializing in communications. Following graduation, he spend 9 years as an Associate Professor of Electrical Engineering and Computer Science at M.I.T. where he was active in research on cognitive perception and signal processing. Since that time, Dr. Blesser has persue a consulting career in a variety of professional disciplines, including medical, character recognition, control systems, digital audio, and technology managment.

Dr. Blesser was one of the pioneers in the digital audio revolution. He was the inventor of the first commercial digital audio reverberation system, the EMT-250; he helped start Lexicon, and he has published numerous papers on many aspects of digital audio. He was the system architect of Orbans Audicy digital audio workstation. In 1980, Dr. Blesser was president of the AES and has been elected member of the board of Governers several time. For the last 25 years he has served as a reviewer on the editorial board of the Journal of the AES, and is currently Consulting Technical Editor for the journal. He was the receipient of the AES Silver, Bronze, and Governors medals and received the award for best published paper several times. He is a Fellow of the AES.

Recently, Dr. Blesser has expanded his professional activities to include technology management and risk engineering. He has held the position of Director of Engineering and Chief Technology Officier in several major audio companies. He was a founder of Pencept, a company that had specialized in recognizing hand print characters. Dr. Blesser has several patents on artificial reverberation. He help found a new start up company, 25-Seven, Inc, specializing in temporal processing of audio signals for the broacast industry.

In 2007, MIT Press published his book, Spaces Speak, Are You Listening? Experiencing Aural Architecture, which considers the auditory experiecne of space from a wide range of perspectives. The Associates of College and Research Libraries award it the Outstanding Academic Title for 2007.