Oct. 23, 2023 (Monday)
Title: “From Speech to Emotion to Mood: Mental Health Modeling in Real-World Environments“
Abstract: Emotions provide critical cues into our health and wellbeing. This is of particular importance in the context of mental health, where changes in emotion may signify changes in symptom severity. However, information about emotion and how it varies over time is often accessible only using survey methodology (e.g., ecological momentary assessment, EMA), which can become burdensome to participants over time. Automated speech emotion recognition systems could provide an alternative, providing quantitative measures of emotion using acoustic data captured passively from a consented individual’s environment. However, speech emotion recognition systems often falter when presented with data collected from unconstrained natural environments due to issues with robustness, generalizability, and invalid assumptions. In this talk, I will discuss our journey in speech-centric mental health modeling, explaining whether, how, and when emotion recognition can be applied to natural unconstrained speech data to measure changes in mental health symptom severity.
Bio: Emily Mower Provost is an Associate Professor and Associate Chair for Graduate Affairs in Computer Science and Engineering at the University of Michigan. She received her Ph.D. in Electrical Engineering from the University of Southern California (USC), Los Angeles, CA in 2010. She has been awarded a Toyota Faculty Scholar Award (2020), National Science Foundation CAREER Award (2017), the Oscar Stern Award for Depression Research (2015), a National Science Foundation Graduate Research Fellowship (2004-2007). She is a co-author on papers selected for IEEE Transactions of Affective Computing best papers of 2021, Best Student Paper at ACM Multimedia, 2014, and the Classifier Sub-Challenge event at the Interspeech 2009 emotion challenge. Her research interests are in human-centered speech and video processing, multimodal interfaces design, and speech-based assistive technology. The goals of her research are motivated by the complexities of the perception and expression of human behavior.
Oct. 24, 2023 (Tuesday)
Title: “Audio-Visual Learning“
Abstract: Perception systems that can both see and hear have great potential to unlock problems in video understanding, augmented reality, and embodied AI. I will present our recent work in audio-visual (AV) perception. First, we explore how audio’s spatial signals can augment visual understanding of 3D environments. This includes ideas for self-supervised feature learning from echoes, AV floorplan reconstruction, and active source separation, where an agent intelligently moves to hear things better in a busy environment. Throughout this line of work, we leverage our open-source SoundSpaces platform, which allows state-of-the-art rendering of highly realistic audio (alongside visuals) in real-world scanned environments. Next, building on these spatial AV ideas, we introduce new ways to enhance the audio stream – making it possible to transport a sound to a new physical environment observed in a photo, or to dereverberate speech so it is intelligible for machine and human ears alike. Finally, I will overview Ego4D, a massive new egocentric video dataset built via a multi-institution collaboration that supports an array of exciting multimodal tasks.
Bio: Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Director in Facebook AI Research (FAIR). Her research in computer vision and machine learning focuses on visual recognition, video, and embodied perception. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is an IEEE Fellow, AAAI Fellow, Sloan Fellow, and recipient of the 2013 Computers and Thought Award. She and her collaborators have been recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017 Helmholtz Prize (test of time award). She has served as Associate Editor-in-Chief for PAMI and Program Chair of CVPR 2015, NeurIPS 2018, and ICCV 2023.
Oct. 25, 2023 (Wednesday)
Open Panel Discussion: “Future of the audio research community in the era of big AI models”
We are collecting questions from the participants. Please join the #open-panel-discussion channel in the WASPAA 2023 Slack workspace and share your thoughts! The organizers will summarize them and designate a first answerer per each question, and then also welcome follow-up discussions.