Skip to content

Author Index

Author Session Title
Atop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Aihara, Ryo P4-11 FasTUSS: Faster Task-Aware Unified Source Separation
Akiyama, Hitoshi O1-1, P1-1 Kernel Ridge Regression Based Sound Field Estimation using a Rigid Spherical Microphone Array
Al-Sinan, Adnan P1-20 Fast Text-to-Audio Generation with Adversarial Post-Training
Aloradi, Ahmad P1-16 VoxATtack: A Multimodal Attack on Voice Anonymization Systems
Amengual Gari, Sebastia V. P3-5 Scene-wide Acoustic Parameter Estimation
Amruthalingam, Ludovic O3-4, P2-23 Listening Intention Estimation for Hearables with Natural Behavior Cues
Ananthabhotla, Ishwarya P3-5 Scene-wide Acoustic Parameter Estimation
O1-5, P1-6 Towards Perception-Informed Latent HRTF Representations
Antonacci, Fabio P1-22 Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
P3-4 Physics-Informed Transfer Learning for Data-Driven Sound Source Reconstruction in Near-Field Acoustic Holography
P4-5 Low-Rank Adaptation of Deep Prior Neural Networks For Room Impulse Response Reconstruction
Archer-Boyd, Alan W. P3-17 Beyond Architecture: The Critical Impact of Inference Overlap on Music Source Separation Benchmarks
Arellano, Silvia P1-4 Room Impulse Response Generation Conditioned on Acoustic Parameters
Aroudi, Ali P4-8 Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network
P1-13 Microphone Occlusion Mitigation for Own-Voice Enhancement in Head-Worn Microphone Arrays Using Switching-Adaptive Beamforming
Arteaga, Daniel P1-4 Room Impulse Response Generation Conditioned on Acoustic Parameters
Azcarreta, Juan P4-8 Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network
Btop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Bacchiani, Michiel O4-3, P4-9 Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
P3-9 ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Badeau, Roland P3-10 IS³ : Generic Impulsive–Stationary Sound Separation in Acoustic Scenes using Deep Filtering
Bagheri Sereshki, Saeed P1-13 Microphone Occlusion Mitigation for Own-Voice Enhancement in Head-Worn Microphone Arrays Using Switching-Adaptive Beamforming
Bando, Yoshiaki P3-16 Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation
Baumann, Pascal O3-4, P2-23 Listening Intention Estimation for Hearables with Natural Behavior Cues
Bäumer, Timm-Jonas O4-1, P4-3 Beamforming with Interaural Time-To-Level Difference Conversion for Hearing Loss Compensation
Bavu, Éric D-3 Real-Time Speech Enhancement in Noise for Throat Microphone Using Neural Audio Codec as Foundation Model
Becker, Luca P3-13 Contrastive Representation Learning for Privacy-Preserving Fine-Tuning of Audio-Visual Speech Recognition
Bello, Juan Pablo P3-20 Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning
P1-9 Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach
Benetos, Emmanouil P1-24 RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection
Bereuter, Paul A. P3-18 Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models
Berg-Kirkpatrick, Taylor P1-20 Fast Text-to-Audio Generation with Adversarial Post-Training
Berger, Clémentine P3-10 IS³ : Generic Impulsive–Stationary Sound Separation in Acoustic Scenes using Deep Filtering
Berger, Jonathan P1-11 The Test of Auditory-Vocal Affect (TAVA) dataset
Bharadwaj, Shikhar O2-4, P2-4 OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
Bhattacharya, Gautam P1-4 Room Impulse Response Generation Conditioned on Acoustic Parameters
Bibbó, Gabriel D-7 Speech Removal Framework for Privacy-preserving Audio Recordings
Bilen, Cagdas P4-8 Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network
Blau, Matthias P4-4 Soft-Constrained Spatially Selective Active Noise Control for Open-fitting Hearables
Bologni, Giovanni P3-6 Cyclic Multichannel Wiener Filter for Acoustic Beamforming
Bovbjerg, Holger Severin P4-2 Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
Bowling, Daniel L. P1-11 The Test of Auditory-Vocal Affect (TAVA) dataset
Bralios, Dimitrios P4-22 Learning to Upsample and Upmix Audio in the Latent Domain
Braun, Sebastian P1-10 Distributed Asynchronous Device Speech Enhancement via Windowed Cross-Attention
Brendel, Andreas P3-12 UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension
Brümann, Klaus P4-1 Incremental Averaging Method to Improve Graph-Based Time-Difference-of-Arrival Estimation
Burgoyne, John Ashley P2-18 Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception
Büthe, Jan O3-3, P2-22 A Lightweight and Robust Method for Blind Wideband-to-Fullband Extension of Speech
Ctop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Calamia, Paul O1-5, P1-6 Towards Perception-Informed Latent HRTF Representations
Cao, Boxuan O3-5, P2-24 Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People
Carr, CJ P1-20 Fast Text-to-Audio Generation with Adversarial Post-Training
Casebeer, Jonah P4-22 Learning to Upsample and Upmix Audio in the Latent Domain
Caspe, Franco Santiago D-1 Neural Audio Synthesis for Non-Keyboard Instruments
Chang, Sungkyun P1-24 RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection
Chen, Haonan P2-8 Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis
Chen, Jianchong P1-12 Unsupervised Multi-channel Speech Dereverberation via Diffusion
Chen, Jitong P2-8 Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis
Chen, Ke P4-21 DiTVC: One-Shot Voice Conversion via Diffusion Transformer with Environment and Speaking Rate Cloning
Chen, Moran P4-10 Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study
Choi, Kwanghee O2-4, P2-4 OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
Choudhari, Vishal O2-3, P2-3 Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Churchwell, Cameron O4-5, P4-19 Combolutional Neural Networks
Cohen, Israel O1-2, P1-2 Optimal Region-of-Interest Beamforming for Audio Conferencing with Dual Perpendicular Sparse Circular Sectors
Comanducci, Luca P1-22 Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
Condit-Schultz, Nathaniel P4-14 The Perception of Phase Intercept Distortion and its Application in Data Augmentation
Corey, Ryan D-5 Wireless Group Conversation Enhancement with the Tympan Open-Source Hearing Aid Platform
Cornell, Samuele O2-4, P2-4 OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
Cui, Shuyang P2-10 SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
Cvetkovic, Zoran P2-11 Neural-Network-Based Interpolation of Late Reverberation in Coupled Spaces Using the Common Slopes Model
P3-22 Perceptually-Driven Panning for an Extended Listening Area
Dtop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Dal Santo, Gloria P2-11 Neural-Network-Based Interpolation of Late Reverberation in Coupled Spaces Using the Common Slopes Model
Das, Arnab P2-12 Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection
Das, Orchisama P2-11 Neural-Network-Based Interpolation of Late Reverberation in Coupled Spaces Using the Common Slopes Model
De Sena, Enzo P3-22 Perceptually-Driven Panning for an Extended Listening Area
de Vries, Johannes W. O4-1, P4-3 Beamforming with Interaural Time-To-Level Difference Conversion for Hearing Loss Compensation
Deacon, Thomas D-7 Speech Removal Framework for Privacy-preserving Audio Recordings
Deshmukh, Soham O2-4, P2-4 OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
Di Carlo, Diego P3-2 Physically Informed Spatial Regularization for Sound Event Localization and Detection
Ding, Sivan P3-20 Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning
Dixit, Satvik P1-17 Learning Perceptually Relevant Temporal Envelope Morphing
Dixon, Simon P1-24 RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection
P4-20 Self-Supervised Representation Learning with a JEPA Framework for Multi-instrument Music Transcription
Doclo, Simon O1-2, P1-2 Optimal Region-of-Interest Beamforming for Audio Conferencing with Dual Perpendicular Sparse Circular Sectors
P4-1 Incremental Averaging Method to Improve Graph-Based Time-Difference-of-Arrival Estimation
P4-4 Soft-Constrained Spatially Selective Active Noise Control for Open-fitting Hearables
Doh, Seungheon P2-7 Can Large Language Models Predict Audio Effects Parameters from Natural Language?
Donahue, Chris P1-17 Learning Perceptually Relevant Temporal Envelope Morphing
Donley, Jacob P4-8 Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network
P2-14 A Unified Framework for Evaluating DNN-Based Feedforward, Feedback, and Hybrid Active Noise Cancellation
Doucet, Arnaud P2-17 Source Separation by Flow Matching
Drossos, Konstantinos P1-14 Multi-Utterance Speech Separation and Association Trained on Short Segments
Duan, Zhiyao O1-5, P1-6 Towards Perception-Informed Latent HRTF Representations
Etop A B C D E F G H I J K L M N O P R S T U V W X Y Z
El Kheir, Yassine P2-12 Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection
Elhilali, Mounya O2-1, P2-1 FlexSED: Towards Open-Vocabulary Sound Event Detection
P3-1 SynSonic: Augmenting Sound Event Detection through Text-to-Audio Diffusion ControlNet and Effective Sample Filtering
Erdogan, Enes Erdem P2-12 Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection
Essid, Slim P3-10 IS³ : Generic Impulsive–Stationary Sound Separation in Acoustic Scenes using Deep Filtering
Evans, Zach P1-20 Fast Text-to-Audio Generation with Adversarial Post-Training
Ftop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Falcon-Perez, Ricardo P3-5 Scene-wide Acoustic Parameter Estimation
Fan, Junyi P3-15 JSQA: Speech Quality Assessment with Perceptually-Inspired Contrastive Pretraining Based on JND Audio Pairs
Fang, Huajian O3-2, P2-21 Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance
P2-14 A Unified Framework for Evaluating DNN-Based Feedforward, Feedback, and Hybrid Active Noise Cancellation
Fazekas, George P4-18 Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior
D-2 PCA-DiffVox: Augmenting Vocal Effects Tweakability With a Bijective Latent Space
Fazi, Filippo P3-23 Theoretical Analysis of Recursive Implementations of Multi-Channel Cross-Talk Cancellation Systems
Feng, Jianyuan P3-11 Hybrid-Sep: Language-queried audio source separation via pre-trained Model Fusion and Adversarial Consistent Training
Fingscheidt, Tim P3-7 EffDiffSE: Efficient Diffusion-Based Frequency-Domain Speech Enhancement with Hybrid Discriminative and Generative DNNs
Finkelstein, Adam P4-21 DiTVC: One-Shot Voice Conversion via Diffusion Transformer with Environment and Speaking Rate Cloning
Fontaine, Mathieu P3-2 Physically Informed Spatial Regularization for Sound Event Localization and Detection
Franck, Andreas P3-23 Theoretical Analysis of Recursive Implementations of Multi-Channel Cross-Talk Cancellation Systems
Francl, Andrew O1-5, P1-6 Towards Perception-Informed Latent HRTF Representations
Fu, Minghao P1-23 Benchmarking Sub-Genre Classification for Mainstage Dance Music
Fu, Yihui P3-7 EffDiffSE: Efficient Diffusion-Based Frequency-Domain Speech Enhancement with Hybrid Discriminative and Generative DNNs
Fuchs, Guillaume P3-12 UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension
Fuentes, Magdalena P1-21 Post-Training Quantization for Audio Diffusion Transformers
P3-20 Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning
Fukayama, Satoru O2-4, P2-4 OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
Gtop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Gao, Ruohan P3-5 Scene-wide Acoustic Parameter Estimation
O1-5, P1-6 Towards Perception-Informed Latent HRTF Representations
Gaznepoglu, Ünal Ege P1-16 VoxATtack: A Multimodal Attack on Voice Anonymization Systems
Gerkmann, Timo O3-2, P2-21 Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance
D-8 Demonstration of LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models Running on a Portable Device
Germain, François G. P4-11 FasTUSS: Faster Task-Aware Unified Source Separation
O1-3, P1-3 Physics-Informed Direction-Aware Neural Acoustic Fields
Giurda, Ruksana O3-4, P2-23 Listening Intention Estimation for Hearables with Natural Behavior Cues
Glaus, Seraina O3-4, P2-23 Listening Intention Estimation for Hearables with Natural Behavior Cues
Gómez-Cañón, Juan S. P1-11 The Test of Auditory-Vocal Affect (TAVA) dataset
Götz, Georg D-6 Next-Generation Synthetic Data Techniques for Training, Evaluation, and Prototyping in Audio Signal Processing
Grinstein, Eric P4-8 Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network
Gröger, Fabian O3-4, P2-23 Listening Intention Estimation for Hearables with Natural Behavior Cues
Grundhuber, Philipp P2-15 Robust Speech Activity Detection in the Presence of Singing Voice
Guevara-Rukoz, Adriana P3-9 ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Guo, Weizhe O2-1, P2-1 FlexSED: Towards Open-Vocabulary Sound Event Detection
Gupta, Kishan P3-12 UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension
Gusó, Enric P4-6 MB-RIRs: a Synthetic Room Impulse Response Dataset with Frequency-Dependent Absorption Coefficients
P1-19 Conditional Wave-U-Net for Acoustic Matching in Shared XR Environments
Htop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Habets, Emanuël A. P. P3-3 Stereo Reproduction in the Presence of Sample Rate Offsets
P1-16 VoxATtack: A Multimodal Attack on Voice Anonymization Systems
P4-15 Device-Centric Room Impulse Response Augmentation Evaluated on Room Geometry Inference
P2-15 Robust Speech Activity Detection in the Presence of Singing Voice
Hai, Jiarui O2-1, P2-1 FlexSED: Towards Open-Vocabulary Sound Event Detection
P3-1 SynSonic: Augmenting Sound Event Detection through Text-to-Audio Diffusion ControlNet and Effective Sample Filtering
Halimeh, Mhd Modar P2-15 Robust Speech Activity Detection in the Presence of Singing Voice
Harju, Manu P1-8 Sound Event Detection with Audio-Text Models and Heterogeneous Temporal Annotations
Haruta, Chiho O4-4, P4-12 Context-Aware Query Refinement for Target Sound Extraction: Handling Partially Matched Queries
Hauret, Julien D-3 Real-Time Speech Enhancement in Noise for Throat Microphone Using Neural Audio Codec as Foundation Model
Heller, Laurie M. P1-17 Learning Perceptually Relevant Temporal Envelope Morphing
Hendriks, Richard C. P3-6 Cyclic Multichannel Wiener Filter for Acoustic Beamforming
O4-1, P4-3 Beamforming with Interaural Time-To-Level Difference Conversion for Hearing Loss Compensation
Hennequin, Romain P1-18 Multi-Class-Token Transformer for Multitask Self-supervised Music Information Retrieval
Hershey, John R. P2-17 Source Separation by Flow Matching
Heusdens, Richard P3-6 Cyclic Multichannel Wiener Filter for Acoustic Beamforming
O4-1, P4-3 Beamforming with Interaural Time-To-Level Difference Conversion for Hearing Loss Compensation
Higuchi, Takuya O3-1, P2-20 Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations
Hirano, Masato P2-13 Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement
Hiruma, Nobuhiko O4-4, P4-12 Context-Aware Query Refinement for Target Sound Extraction: Handling Partially Matched Queries
Hsieh, Tsun-An P2-19 TGIF: Talker Group-Informed Familiarization of Target Speaker Extraction
Hung, Yun-Ning (Amy) P4-17 Moises-Light: Resource-efficient Band-split U-Net For Music Source Separation
P3-19 Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures
Itop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Ick, Christopher O1-3, P1-3 Physics-Informed Direction-Aware Neural Acoustic Fields
Imoto, Keisuke O4-4, P4-12 Context-Aware Query Refinement for Target Sound Extraction: Handling Partially Matched Queries
Iodice, Gian Marco P1-20 Fast Text-to-Audio Generation with Adversarial Post-Training
Ishikawa, Haruko O4-3, P4-9 Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
P3-9 ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Itou, Hiroaki O1-4, P1-5 Source and Sensor Placement for Sound Field Control Based on Mean Square Error with Prior Spatial Covariance
Itzhak, Gal O1-2, P1-2 Optimal Region-of-Interest Beamforming for Audio Conferencing with Dual Perpendicular Sparse Circular Sectors
Jtop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Jaffe, Noah P2-18 Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception
Jang, Inseon P1-15 Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine
Jensen, Jesper P4-2 Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
Jiang, Hongyu P1-23 Benchmarking Sub-Genre Classification for Mainstage Dance Music
Jiang, Xilin O2-3, P2-3 Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Jin, Zeyu P4-21 DiTVC: One-Shot Voice Conversion via Diffusion Transformer with Environment and Speaking Rate Cloning
Joubaud, Thomas D-3 Real-Time Speech Enhancement in Noise for Throat Microphone Using Neural Audio Codec as Foundation Model
Ktop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Kamado, Noriyoshi O1-4, P1-5 Source and Sensor Placement for Sound Field Control Based on Mean Square Error with Prior Spatial Covariance
Karita, Shigeki O4-3, P4-9 Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
P3-9 ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Khandelwal, Tanmay P1-21 Post-Training Quantization for Audio Diffusion Transformers
Khera, Harnick P3-17 Beyond Architecture: The Critical Impact of Inference Overlap on Music Source Separation Benchmarks
Kienegger, Jakob O3-2, P2-21 Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance
Kim, Minje P3-8 Adaptive Slimming for Scalable and Efficient Speech Enhancement
P2-19 TGIF: Talker Group-Informed Familiarization of Target Speaker Extraction
P1-15 Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine
O4-5, P4-19 Combolutional Neural Networks
Klapuri, Anssi O2-6, P2-6 Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music
Kleijn, W. Bastiaan O4-2, P4-7 Robust One-step Speech Enhancement via Consistency Distillation
Koizumi, Yuma O4-3, P4-9 Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Kong, Yuexuan P1-18 Multi-Class-Token Transformer for Multitask Self-supervised Music Information Retrieval
Koo, Junghyun P4-18 Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior
P2-7 Can Large Language Models Predict Audio Effects Parameters from Natural Language?
D-2 PCA-DiffVox: Augmenting Vocal Effects Tweakability With a Bijective Latent Space
Korse, Srikanth P3-3 Stereo Reproduction in the Presence of Sample Rate Offsets
P3-12 UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension
Korzeniowski, Filip P4-17 Moises-Light: Resource-efficient Band-split U-Net For Music Source Separation
Koyama, Shoichi P4-23 Head-Related Transfer Function Individualization Using Anthropometric Features and Spatially Independent Latent Representation
P4-5 Low-Rank Adaptation of Deep Prior Neural Networks For Room Impulse Response Reconstruction
O1-4, P1-5 Source and Sensor Placement for Sound Field Control Based on Mean Square Error with Prior Spatial Covariance
Kozuka, Shihori O1-4, P1-5 Source and Sensor Placement for Sound Field Control Based on Mean Square Error with Prior Spatial Covariance
Kumar, Rithesh P4-21 DiTVC: One-Shot Voice Conversion via Diffusion Transformer with Environment and Speaking Rate Cloning
Kumar, Sonal P3-21 SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation
Kusaka, Yuta P2-9 Learn from Virtual Guitar: A Comparative Analysis of Automatic Guitar Transcription using Synthetic and Real Audio
Kuznetsova, Anastasia P1-15 Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine
Ltop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Lagrange, Mathieu P1-18 Multi-Class-Token Transformer for Multitask Self-supervised Music Information Retrieval
Laroche, Clement P3-8 Adaptive Slimming for Scalable and Efficient Speech Enhancement
Le Roux, Jonathan P4-11 FasTUSS: Faster Task-Aware Unified Source Separation
O1-3, P1-3 Physics-Informed Direction-Aware Neural Acoustic Fields
Lee, Jung-Suk P1-13 Microphone Occlusion Mitigation for Own-Voice Enhancement in Head-Worn Microphone Arrays Using Switching-Adaptive Beamforming
Lee, Sanha P4-8 Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network
Li, Cole P4-8 Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network
Li, Guangzheng P3-11 Hybrid-Sep: Language-queried audio source separation via pre-trained Model Fusion and Adversarial Consistent Training
Li, Haizhou P4-10 Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study
Li, Henry P2-17 Source Separation by Flow Matching
Li, Linkai O3-5, P2-24 Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People
Li, Xinglin P1-23 Benchmarking Sub-Genre Classification for Mainstage Dance Music
Li, Xinyu P1-23 Benchmarking Sub-Genre Classification for Mainstage Dance Music
Liao, Wei-Hsiang P4-18 Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior
P2-7 Can Large Language Models Predict Audio Effects Parameters from Natural Language?
D-2 PCA-DiffVox: Augmenting Vocal Effects Tweakability With a Bijective Latent Space
Lim, Wootaek P1-15 Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine
Lionetti, Simone O3-4, P2-23 Listening Intention Estimation for Hearables with Natural Behavior Cues
Liu, Haocheng P3-2 Physically Informed Spatial Regularization for Sound Event Localization and Detection
Liu, Hexin P4-10 Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study
Lladó, Pedro P3-22 Perceptually-Driven Panning for an Extended Listening Area
Lostanlen, Vincent P1-18 Multi-Class-Token Transformer for Multitask Self-supervised Music Information Retrieval
Luan, Xinmeng P3-4 Physics-Informed Transfer Learning for Data-Driven Sound Source Reconstruction in Near-Field Acoustic Holography
Luberadzka, Joanna P4-6 MB-RIRs: a Synthetic Room Impulse Response Dataset with Frequency-Dependent Absorption Coefficients
P1-19 Conditional Wave-U-Net for Acoustic Matching in Shared XR Environments
Lukoianov, Aleksandr O2-6, P2-6 Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music
Mtop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Ma, Teng D-4 Real-Time System for Audio-Visual Target Speech Enhancement
Maezawa, Akira P2-9 Learn from Virtual Guitar: A Comparative Analysis of Automatic Guitar Transcription using Synthetic and Real Audio
Mannanova, Alina O3-2, P2-21 Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance
Manocha, Dinesh P3-21 SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation
Martin, Rainer P3-13 Contrastive Representation Learning for Privacy-Preserving Fine-Tuning of Audio-Visual Speech Recognition
Martínez-Ramírez, Marco A. P4-18 Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior
P2-7 Can Large Language Models Predict Audio Effects Parameters from Natural Language?
D-2 PCA-DiffVox: Augmenting Vocal Effects Tweakability With a Bijective Latent Space
Masuyama, Yoshiki P4-11 FasTUSS: Faster Task-Aware Unified Source Separation
O1-3, P1-3 Physics-Informed Direction-Aware Neural Acoustic Fields
Matsuda, Ryo O1-1, P1-1 Kernel Ridge Regression Based Sound Field Estimation using a Rigid Spherical Microphone Array
Mauch, Matthias P4-20 Self-Supervised Representation Learning with a JEPA Framework for Multi-instrument Music Transcription
Mawalim, Candy Olivia P4-16 Modeling Multi-Level Hearing Loss for Speech Intelligibility Prediction
McAuley, Julian P1-20 Fast Text-to-Audio Generation with Adversarial Post-Training
McPherson, Andrew D-1 Neural Audio Synthesis for Non-Keyboard Instruments
Mesaros, Annamaria P1-7 Online Incremental Learning for Audio Classification Using a Pretrained Audio Model
P1-8 Sound Event Detection with Audio-Text Models and Heterogeneous Temporal Annotations
Meseguer-Brocal, Gabriel P1-18 Multi-Class-Token Transformer for Multitask Self-supervised Music Information Retrieval
Mesgarani, Nima O2-3, P2-3 Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Miccini, Riccardo P3-8 Adaptive Slimming for Scalable and Efficient Speech Enhancement
Middelberg, Wiebke P1-13 Microphone Occlusion Mitigation for Own-Voice Enhancement in Head-Worn Microphone Arrays Using Switching-Adaptive Beamforming
Miotello, Federico P4-5 Low-Rank Adaptation of Deep Prior Neural Networks For Room Impulse Response Reconstruction
Mitcheltree, Christopher O2-5, P2-5 Modulation Discovery with Differentiable Digital Signal Processing
Mitsufuji, Yuki P4-18 Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior
P2-10 SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
P2-13 Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement
P2-7 Can Large Language Models Predict Audio Effects Parameters from Natural Language?
D-2 PCA-DiffVox: Augmenting Vocal Effects Tweakability With a Bijective Latent Space
Mo, Changgeng O3-5, P2-24 Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People
Möller, Sebastian P2-12 Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection
Mueckl, Gregor P3-5 Scene-wide Acoustic Parameter Estimation
Mulimani, Manjunath P1-7 Online Incremental Learning for Audio Classification Using a Pretrained Audio Model
Murata, Naoki P2-13 Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement
Ntop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Nakamura, Tomohiko P4-23 Head-Related Transfer Function Individualization Using Anthropometric Features and Spatially Independent Latent Representation
Nakata, Wataru P3-9 ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Nam, Juhan P2-7 Can Large Language Models Predict Audio Effects Parameters from Natural Language?
Neidhardt, Annika P3-22 Perceptually-Driven Panning for an Extended Listening Area
Nielsen, Daniel Gert D-6 Next-Generation Synthetic Data Techniques for Training, Evaluation, and Prototyping in Audio Signal Processing
Nieto, Oriol P3-21 SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation
Nishigori, Shuichiro P2-13 Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement
Niu, Ryan P4-23 Head-Related Transfer Function Individualization Using Anthropometric Features and Spatially Independent Latent Representation
Noufi, Camille P1-11 The Test of Auditory-Vocal Affect (TAVA) dataset
Novack, Zachary P1-20 Fast Text-to-Audio Generation with Adversarial Post-Training
Nugraha, Aditya Arie P3-2 Physically Informed Spatial Regularization for Sound Event Localization and Detection
Otop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Okamoto, Takuma P3-24 SFC-L1: Sound Field Control With Least Absolute Deviation Regression
Oliveira, Danilo D-8 Demonstration of LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models Running on a Portable Device
Ono, Nobutaka P4-1 Incremental Averaging Method to Improve Graph-Based Time-Difference-of-Arrival Estimation
Østergaard, Jan P4-2 Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
Ptop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Paissan, Francesco P4-11 FasTUSS: Faster Task-Aware Unified Source Separation
Pandey, Ashutosh P4-8 Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network
P2-14 A Unified Framework for Evaluating DNN-Based Feedforward, Feedback, and Hybrid Active Noise Cancellation
Park, Sungjoon P1-17 Learning Perceptually Relevant Temporal Envelope Morphing
Parker, Julian P1-20 Fast Text-to-Audio Generation with Adversarial Post-Training
Parker, Karen J. P1-11 The Test of Auditory-Vocal Affect (TAVA) dataset
Passoni, Riccardo P1-22 Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
Pauwels, Johan P3-17 Beyond Architecture: The Critical Impact of Inference Overlap on Music Source Separation Benchmarks
Peer, Tal D-8 Demonstration of LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models Running on a Portable Device
Pereira, Igor P4-17 Moises-Light: Resource-efficient Band-split U-Net For Music Source Separation
P3-19 Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures
Pezzarossa, Luca P3-8 Adaptive Slimming for Scalable and Efficient Speech Enhancement
Pezzoli, Mirco P3-4 Physics-Informed Transfer Learning for Data-Driven Sound Source Reconstruction in Near-Field Acoustic Holography
P4-5 Low-Rank Adaptation of Deep Prior Neural Networks For Room Impulse Response Reconstruction
Pia, Nicola P3-12 UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension
Pieper, Jaden P2-16 Frequency-Domain Signal-to-Noise Ratios Illuminate the Effects of the Spectral Consistency Constraint and Griffin-Lim Algorithms
Pilataki, Mary P4-20 Self-Supervised Representation Learning with a JEPA Framework for Multi-instrument Music Transcription
Pind, Finnur D-6 Next-Generation Synthetic Data Techniques for Training, Evaluation, and Prototyping in Audio Signal Processing
Plaja-Roglans, Genís P3-19 Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures
Plumbley, Mark D. P3-18 Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models
D-7 Speech Removal Framework for Privacy-preserving Audio Recordings
Politis, Archontis P1-14 Multi-Utterance Speech Separation and Association Trained on Short Segments
Polzehl, Tim P2-12 Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection
Pons, Jordi P1-20 Fast Text-to-Audio Generation with Adversarial Post-Training
Primus, Paul O2-2, P2-2 TACOS: Temporally-Aligned Audio Captions for Language-Audio Pretraining
Rtop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Reiss, Joshua D. O2-5, P2-5 Modulation Discovery with Differentiable Digital Signal Processing
Ribeiro, Juliano G. C. O1-1, P1-1 Kernel Ridge Regression Based Sound Field Estimation using a Rigid Spherical Microphone Array
Richard, Gaël P3-2 Physically Informed Spatial Regularization for Sound Event Localization and Detection
Richter, Julius D-8 Demonstration of LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models Running on a Portable Device
Ritter-Gutierrez, Fabian P2-12 Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection
Roden, Reinhild P4-4 Soft-Constrained Spatially Selective Active Noise Control for Open-fitting Hearables
Roman, Adrian S. P1-9 Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach
Roman, Iran R. P1-9 Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach
Ronchini, Francesca P1-22 Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
Rowe, David P4-13 RADE: A Neural Codec for Transmitting Speech over HF Radio Channels
Roy Choudhury, Romit P1-12 Unsupervised Multi-channel Speech Dereverberation via Diffusion
Stop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Sach, Marvin P3-7 EffDiffSE: Efficient Diffusion-Based Frequency-Domain Speech Enhancement with Hybrid Discriminative and Generative DNNs
Saijo, Kohei P4-11 FasTUSS: Faster Task-Aware Unified Source Separation
P3-16 Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation
Saito, Koichi P2-13 Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement
Salamon, Justin P3-21 SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation
Sandler, Mark B. P3-17 Beyond Architecture: The Critical Impact of Inference Overlap on Music Source Separation Benchmarks
D-1 Neural Audio Synthesis for Non-Keyboard Instruments
Sarti, Augusto P3-4 Physics-Informed Transfer Learning for Data-Driven Sound Source Reconstruction in Near-Field Acoustic Holography
Sato, Ryo O4-4, P4-12 Context-Aware Query Refinement for Target Sound Extraction: Handling Partially Matched Queries
Sayin, Umut P4-6 MB-RIRs: a Synthetic Room Impulse Response Dataset with Frequency-Dependent Absorption Coefficients
P1-19 Conditional Wave-U-Net for Acoustic Matching in Shared XR Environments
Scheibler, Robin O4-3, P4-9 Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
P3-9 ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
P2-17 Source Separation by Flow Matching
Schlecht, Sebastian J. P2-11 Neural-Network-Based Interpolation of Late Reverberation in Coupled Spaces Using the Common Slopes Model
Schmid, Florian O2-2, P2-2 TACOS: Temporally-Aligned Audio Captions for Language-Audio Pretraining
Seetharaman, Prem P3-21 SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation
Serizel, Romain P1-22 Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
Serra, Xavier P4-6 MB-RIRs: a Synthetic Room Impulse Response Dataset with Frequency-Dependent Absorption Coefficients
P3-19 Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures
Shi, Renzheng P3-7 EffDiffSE: Efficient Diffusion-Based Frequency-Domain Speech Enhancement with Hybrid Discriminative and Generative DNNs
Shim, Hye-jin O2-4, P2-4 OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
Shu, Hongzhi P1-23 Benchmarking Sub-Genre Classification for Mainstage Dance Music
Simón Gálvez, Marcos F. P3-23 Theoretical Analysis of Recursive Implementations of Multi-Channel Cross-Talk Cancellation Systems
Singh, Arshdeep D-7 Speech Removal Framework for Privacy-preserving Audio Recordings
Smaragdis, Paris P3-8 Adaptive Slimming for Scalable and Efficient Speech Enhancement
O3-1, P2-20 Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations
P4-22 Learning to Upsample and Upmix Audio in the Latent Domain
O4-5, P4-19 Combolutional Neural Networks
Song, Zeyang P4-10 Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study
Sontacchi, Alois P3-18 Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models
Souchaud, Antoine R. P3-22 Perceptually-Driven Panning for an Extended Listening Area
Souden, Mehrez O3-1, P2-20 Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations
Srinivas, Shanmukha P4-8 Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network
Stahl, Benjamin P3-18 Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models
Stamatiadis, Paraskevas P3-10 IS³ : Generic Impulsive–Stationary Sound Separation in Acoustic Scenes using Deep Filtering
Strauß, Martin P2-15 Robust Speech Activity Detection in the Presence of Singing Voice
Su, Jiaqi P4-21 DiTVC: One-Shot Voice Conversion via Diffusion Transformer with Environment and Speaking Rate Cloning
Subramani, Krishna O3-1, P2-20 Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations
Ttop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Takahashi, Akira P2-10 SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
Takahashi, Shusuke P2-10 SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
P2-13 Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement
Tan, Hao Hao O2-5, P2-5 Modulation Discovery with Differentiable Digital Signal Processing
Tan, Zheng-Hua P4-2 Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
Taylor, Josiah P1-20 Fast Text-to-Audio Generation with Adversarial Post-Training
Tenbrinck, Daniel P1-16 VoxATtack: A Multimodal Attack on Voice Anonymization Systems
Tirry, Wouter P3-7 EffDiffSE: Efficient Diffusion-Based Frequency-Domain Speech Enhancement with Hybrid Discriminative and Generative DNNs
Töpken, Stephan O4-1, P4-3 Beamforming with Interaural Time-To-Level Difference Conversion for Hearing Loss Compensation
Tourbabin, Vladimir P1-13 Microphone Occlusion Mitigation for Own-Voice Enhancement in Head-Worn Microphone Arrays Using Switching-Adaptive Beamforming
Toyama, Keisuke P2-10 SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
Trevino, Jorge O1-1, P1-1 Kernel Ridge Regression Based Sound Field Estimation using a Rigid Spherical Microphone Array
Tuna, Cagdas P4-15 Device-Centric Room Impulse Response Augmentation Evaluated on Room Geometry Inference
Utop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Unoki, Masashi P4-16 Modeling Multi-Level Hearing Loss for Speech Intelligibility Prediction
Vtop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Vaidyanathapuram Krishnan, Venkatakrishnan P4-14 The Perception of Phase Intercept Distortion and its Application in Data Augmentation
Valin, Jean-Marc P4-13 RADE: A Neural Codec for Transmitting Speech over HF Radio Channels
O3-3, P2-22 A Lightweight and Robust Method for Blind Wideband-to-Fullband Extension of Speech
van de Par, Steven O4-1, P4-3 Beamforming with Interaural Time-To-Level Difference Conversion for Hearing Loss Compensation
Verhulst, Sarah P3-14 Low-Complexity Individualized Noise Reduction for Real-Time Processing
Veronesi, Francesco P3-23 Theoretical Analysis of Recursive Implementations of Multi-Channel Cross-Talk Cancellation Systems
Virtanen, Tuomas P1-14 Multi-Utterance Speech Separation and Association Trained on Short Segments
Voran, Stephen P2-16 Frequency-Domain Signal-to-Noise Ratios Illuminate the Effects of the Spectral Consistency Constraint and Griffin-Lim Algorithms
Wtop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Walther, Andreas P3-3 Stereo Reproduction in the Presence of Sample Rate Offsets
P4-15 Device-Centric Room Impulse Response Augmentation Evaluated on Room Geometry Inference
Wang, Deliang P2-14 A Unified Framework for Evaluating DNN-Based Feedforward, Feedback, and Hybrid Active Noise Cancellation
Wang, Helin O2-1, P2-1 FlexSED: Towards Open-Vocabulary Sound Event Detection
Wang, Ju-Chiang P2-8 Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis
Wang, Shan Xiang O3-5, P2-24 Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People
Wang, Yunyun P4-21 DiTVC: One-Shot Voice Conversion via Diffusion Transformer with Environment and Speaking Rate Cloning
Wang, Yuzhu P1-14 Multi-Utterance Speech Separation and Association Trained on Short Segments
Wang, Zhong-Qiu P1-12 Unsupervised Multi-channel Speech Dereverberation via Diffusion
Watanabe, Shinji O2-4, P2-4 OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
P4-2 Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
Wen, Chuan P3-14 Low-Complexity Individualized Noise Reduction for Real-Time Processing
Wichern, Gordon P4-11 FasTUSS: Faster Task-Aware Unified Source Separation
O1-3, P1-3 Physics-Informed Direction-Aware Neural Acoustic Fields
Widmer, Gerhard O2-2, P2-2 TACOS: Temporally-Aligned Audio Captions for Language-Audio Pretraining
Wilkins, Julia P3-20 Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning
Williamson, Donald P3-15 JSQA: Speech Quality Assessment with Perceptually-Inspired Contrastive Pretraining Based on JND Audio Pairs
Wong, Daniel D. E. P2-14 A Unified Framework for Evaluating DNN-Based Feedforward, Feedback, and Hybrid Active Noise Cancellation
P1-13 Microphone Occlusion Mitigation for Own-Voice Enhancement in Head-Worn Microphone Arrays Using Switching-Adaptive Beamforming
Wu, Junkai O2-3, P2-3 Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Wu, Yulun P1-12 Unsupervised Multi-channel Speech Dereverberation via Diffusion
Xtop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Xiao, Tong P4-4 Soft-Constrained Spatially Selective Active Noise Control for Open-fitting Hearables
Xu, Buye P2-14 A Unified Framework for Evaluating DNN-Based Feedforward, Feedback, and Hybrid Active Noise Cancellation
Xu, Liang O4-2, P4-7 Robust One-step Speech Enhancement via Consistency Distillation
Xu, Yangfei P3-11 Hybrid-Sep: Language-queried audio source separation via pre-trained Model Fusion and Adversarial Consistent Training
Xu, Zhongweiyang P1-12 Unsupervised Multi-channel Speech Dereverberation via Diffusion
Ytop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Yamaoka, Kouei P4-1 Incremental Averaging Method to Improve Graph-Based Time-Difference-of-Arrival Estimation
Yan, Longfei Felix O4-2, P4-7 Robust One-step Speech Enhancement via Consistency Distillation
Yang, Gene-Ping P1-10 Distributed Asynchronous Device Speech Enhancement via Windowed Cross-Attention
Yang, Li-Chia D-4 Real-Time System for Audio-Visual Target Speech Enhancement
Yeh, Chunghsin P1-4 Room Impulse Response Generation Conditioned on Acoustic Parameters
Yin, Sile D-4 Real-Time System for Audio-Visual Target Speech Enhancement
Yoshii, Kazuyoshi P3-2 Physically Informed Spatial Regularization for Sound Event Localization and Detection
Yu, Chin-Yun P4-18 Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior
D-2 PCA-DiffVox: Augmenting Vocal Effects Tweakability With a Bijective Latent Space
Yuma, Koizumi P3-9 ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Ztop A B C D E F G H I J K L M N O P R S T U V W X Y Z
Zen, Heiga O4-3, P4-9 Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
P3-9 ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Zhang, Qiquan P4-10 Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study
Zhang, Shuo D-4 Real-Time System for Audio-Visual Target Speech Enhancement
Zhang, Xiangyu P4-10 Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study
Zhang, Yixiao P2-8 Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis
Zhang, You O1-5, P1-6 Towards Perception-Informed Latent HRTF Representations
Zhong, Zhi P2-10 SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
Zhou, Haoshuai O3-5, P2-24 Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People
Zhou, Xiajie P4-16 Modeling Multi-Level Hearing Loss for Speech Intelligibility Prediction
Zukowski, Zack P1-20 Fast Text-to-Audio Generation with Adversarial Post-Training