| Atop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Aihara, Ryo |
P4-11 |
FasTUSS: Faster Task-Aware Unified Source Separation |
| Akiyama, Hitoshi |
O1-1, P1-1 |
Kernel Ridge Regression Based Sound Field Estimation using a Rigid Spherical Microphone Array |
| Al-Sinan, Adnan |
P1-20 |
Fast Text-to-Audio Generation with Adversarial Post-Training |
| Aloradi, Ahmad |
P1-16 |
VoxATtack: A Multimodal Attack on Voice Anonymization Systems |
| Amengual Gari, Sebastia V. |
P3-5 |
Scene-wide Acoustic Parameter Estimation |
| Amruthalingam, Ludovic |
O3-4, P2-23 |
Listening Intention Estimation for Hearables with Natural Behavior Cues |
| Ananthabhotla, Ishwarya |
P3-5 |
Scene-wide Acoustic Parameter Estimation |
O1-5, P1-6 |
Towards Perception-Informed Latent HRTF Representations |
| Antonacci, Fabio |
P1-22 |
Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models |
P3-4 |
Physics-Informed Transfer Learning for Data-Driven Sound Source Reconstruction in Near-Field Acoustic Holography |
P4-5 |
Low-Rank Adaptation of Deep Prior Neural Networks For Room Impulse Response Reconstruction |
| Archer-Boyd, Alan W. |
P3-17 |
Beyond Architecture: The Critical Impact of Inference Overlap on Music Source Separation Benchmarks |
| Arellano, Silvia |
P1-4 |
Room Impulse Response Generation Conditioned on Acoustic Parameters |
| Aroudi, Ali |
P4-8 |
Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network |
P1-13 |
Microphone Occlusion Mitigation for Own-Voice Enhancement in Head-Worn Microphone Arrays Using Switching-Adaptive Beamforming |
| Arteaga, Daniel |
P1-4 |
Room Impulse Response Generation Conditioned on Acoustic Parameters |
| Azcarreta, Juan |
P4-8 |
Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network |
| Btop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Bacchiani, Michiel |
O4-3, P4-9 |
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration |
P3-9 |
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability |
| Badeau, Roland |
P3-10 |
IS³ : Generic Impulsive–Stationary Sound Separation in Acoustic Scenes using Deep Filtering |
| Bagheri Sereshki, Saeed |
P1-13 |
Microphone Occlusion Mitigation for Own-Voice Enhancement in Head-Worn Microphone Arrays Using Switching-Adaptive Beamforming |
| Bando, Yoshiaki |
P3-16 |
Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation |
| Baumann, Pascal |
O3-4, P2-23 |
Listening Intention Estimation for Hearables with Natural Behavior Cues |
| Bäumer, Timm-Jonas |
O4-1, P4-3 |
Beamforming with Interaural Time-To-Level Difference Conversion for Hearing Loss Compensation |
| Bavu, Éric |
D-3 |
Real-Time Speech Enhancement in Noise for Throat Microphone Using Neural Audio Codec as Foundation Model |
| Becker, Luca |
P3-13 |
Contrastive Representation Learning for Privacy-Preserving Fine-Tuning of Audio-Visual Speech Recognition |
| Bello, Juan Pablo |
P3-20 |
Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning |
P1-9 |
Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach |
| Benetos, Emmanouil |
P1-24 |
RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection |
| Bereuter, Paul A. |
P3-18 |
Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models |
| Berg-Kirkpatrick, Taylor |
P1-20 |
Fast Text-to-Audio Generation with Adversarial Post-Training |
| Berger, Clémentine |
P3-10 |
IS³ : Generic Impulsive–Stationary Sound Separation in Acoustic Scenes using Deep Filtering |
| Berger, Jonathan |
P1-11 |
The Test of Auditory-Vocal Affect (TAVA) dataset |
| Bharadwaj, Shikhar |
O2-4, P2-4 |
OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder |
| Bhattacharya, Gautam |
P1-4 |
Room Impulse Response Generation Conditioned on Acoustic Parameters |
| Bibbó, Gabriel |
D-7 |
Speech Removal Framework for Privacy-preserving Audio Recordings |
| Bilen, Cagdas |
P4-8 |
Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network |
| Blau, Matthias |
P4-4 |
Soft-Constrained Spatially Selective Active Noise Control for Open-fitting Hearables |
| Bologni, Giovanni |
P3-6 |
Cyclic Multichannel Wiener Filter for Acoustic Beamforming |
| Bovbjerg, Holger Severin |
P4-2 |
Learning Robust Spatial Representations from Binaural Audio through Feature Distillation |
| Bowling, Daniel L. |
P1-11 |
The Test of Auditory-Vocal Affect (TAVA) dataset |
| Bralios, Dimitrios |
P4-22 |
Learning to Upsample and Upmix Audio in the Latent Domain |
| Braun, Sebastian |
P1-10 |
Distributed Asynchronous Device Speech Enhancement via Windowed Cross-Attention |
| Brendel, Andreas |
P3-12 |
UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension |
| Brümann, Klaus |
P4-1 |
Incremental Averaging Method to Improve Graph-Based Time-Difference-of-Arrival Estimation |
| Burgoyne, John Ashley |
P2-18 |
Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception |
| Büthe, Jan |
O3-3, P2-22 |
A Lightweight and Robust Method for Blind Wideband-to-Fullband Extension of Speech |
| Ctop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Calamia, Paul |
O1-5, P1-6 |
Towards Perception-Informed Latent HRTF Representations |
| Cao, Boxuan |
O3-5, P2-24 |
Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People |
| Carr, CJ |
P1-20 |
Fast Text-to-Audio Generation with Adversarial Post-Training |
| Casebeer, Jonah |
P4-22 |
Learning to Upsample and Upmix Audio in the Latent Domain |
| Caspe, Franco Santiago |
D-1 |
Neural Audio Synthesis for Non-Keyboard Instruments |
| Chang, Sungkyun |
P1-24 |
RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection |
| Chen, Haonan |
P2-8 |
Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis |
| Chen, Jianchong |
P1-12 |
Unsupervised Multi-channel Speech Dereverberation via Diffusion |
| Chen, Jitong |
P2-8 |
Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis |
| Chen, Ke |
P4-21 |
DiTVC: One-Shot Voice Conversion via Diffusion Transformer with Environment and Speaking Rate Cloning |
| Chen, Moran |
P4-10 |
Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study |
| Choi, Kwanghee |
O2-4, P2-4 |
OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder |
| Choudhari, Vishal |
O2-3, P2-3 |
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation |
| Churchwell, Cameron |
O4-5, P4-19 |
Combolutional Neural Networks |
| Cohen, Israel |
O1-2, P1-2 |
Optimal Region-of-Interest Beamforming for Audio Conferencing with Dual Perpendicular Sparse Circular Sectors |
| Comanducci, Luca |
P1-22 |
Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models |
| Condit-Schultz, Nathaniel |
P4-14 |
The Perception of Phase Intercept Distortion and its Application in Data Augmentation |
| Corey, Ryan |
D-5 |
Wireless Group Conversation Enhancement with the Tympan Open-Source Hearing Aid Platform |
| Cornell, Samuele |
O2-4, P2-4 |
OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder |
| Cui, Shuyang |
P2-10 |
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet |
| Cvetkovic, Zoran |
P2-11 |
Neural-Network-Based Interpolation of Late Reverberation in Coupled Spaces Using the Common Slopes Model |
P3-22 |
Perceptually-Driven Panning for an Extended Listening Area |
| Dtop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Dal Santo, Gloria |
P2-11 |
Neural-Network-Based Interpolation of Late Reverberation in Coupled Spaces Using the Common Slopes Model |
| Das, Arnab |
P2-12 |
Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection |
| Das, Orchisama |
P2-11 |
Neural-Network-Based Interpolation of Late Reverberation in Coupled Spaces Using the Common Slopes Model |
| De Sena, Enzo |
P3-22 |
Perceptually-Driven Panning for an Extended Listening Area |
| de Vries, Johannes W. |
O4-1, P4-3 |
Beamforming with Interaural Time-To-Level Difference Conversion for Hearing Loss Compensation |
| Deacon, Thomas |
D-7 |
Speech Removal Framework for Privacy-preserving Audio Recordings |
| Deshmukh, Soham |
O2-4, P2-4 |
OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder |
| Di Carlo, Diego |
P3-2 |
Physically Informed Spatial Regularization for Sound Event Localization and Detection |
| Ding, Sivan |
P3-20 |
Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning |
| Dixit, Satvik |
P1-17 |
Learning Perceptually Relevant Temporal Envelope Morphing |
| Dixon, Simon |
P1-24 |
RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection |
P4-20 |
Self-Supervised Representation Learning with a JEPA Framework for Multi-instrument Music Transcription |
| Doclo, Simon |
O1-2, P1-2 |
Optimal Region-of-Interest Beamforming for Audio Conferencing with Dual Perpendicular Sparse Circular Sectors |
P4-1 |
Incremental Averaging Method to Improve Graph-Based Time-Difference-of-Arrival Estimation |
P4-4 |
Soft-Constrained Spatially Selective Active Noise Control for Open-fitting Hearables |
| Doh, Seungheon |
P2-7 |
Can Large Language Models Predict Audio Effects Parameters from Natural Language? |
| Donahue, Chris |
P1-17 |
Learning Perceptually Relevant Temporal Envelope Morphing |
| Donley, Jacob |
P4-8 |
Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network |
P2-14 |
A Unified Framework for Evaluating DNN-Based Feedforward, Feedback, and Hybrid Active Noise Cancellation |
| Doucet, Arnaud |
P2-17 |
Source Separation by Flow Matching |
| Drossos, Konstantinos |
P1-14 |
Multi-Utterance Speech Separation and Association Trained on Short Segments |
| Duan, Zhiyao |
O1-5, P1-6 |
Towards Perception-Informed Latent HRTF Representations |
| Etop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| El Kheir, Yassine |
P2-12 |
Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection |
| Elhilali, Mounya |
O2-1, P2-1 |
FlexSED: Towards Open-Vocabulary Sound Event Detection |
P3-1 |
SynSonic: Augmenting Sound Event Detection through Text-to-Audio Diffusion ControlNet and Effective Sample Filtering |
| Erdogan, Enes Erdem |
P2-12 |
Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection |
| Essid, Slim |
P3-10 |
IS³ : Generic Impulsive–Stationary Sound Separation in Acoustic Scenes using Deep Filtering |
| Evans, Zach |
P1-20 |
Fast Text-to-Audio Generation with Adversarial Post-Training |
| Ftop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Falcon-Perez, Ricardo |
P3-5 |
Scene-wide Acoustic Parameter Estimation |
| Fan, Junyi |
P3-15 |
JSQA: Speech Quality Assessment with Perceptually-Inspired Contrastive Pretraining Based on JND Audio Pairs |
| Fang, Huajian |
O3-2, P2-21 |
Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance |
P2-14 |
A Unified Framework for Evaluating DNN-Based Feedforward, Feedback, and Hybrid Active Noise Cancellation |
| Fazekas, George |
P4-18 |
Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior |
D-2 |
PCA-DiffVox: Augmenting Vocal Effects Tweakability With a Bijective Latent Space |
| Fazi, Filippo |
P3-23 |
Theoretical Analysis of Recursive Implementations of Multi-Channel Cross-Talk Cancellation Systems |
| Feng, Jianyuan |
P3-11 |
Hybrid-Sep: Language-queried audio source separation via pre-trained Model Fusion and Adversarial Consistent Training |
| Fingscheidt, Tim |
P3-7 |
EffDiffSE: Efficient Diffusion-Based Frequency-Domain Speech Enhancement with Hybrid Discriminative and Generative DNNs |
| Finkelstein, Adam |
P4-21 |
DiTVC: One-Shot Voice Conversion via Diffusion Transformer with Environment and Speaking Rate Cloning |
| Fontaine, Mathieu |
P3-2 |
Physically Informed Spatial Regularization for Sound Event Localization and Detection |
| Franck, Andreas |
P3-23 |
Theoretical Analysis of Recursive Implementations of Multi-Channel Cross-Talk Cancellation Systems |
| Francl, Andrew |
O1-5, P1-6 |
Towards Perception-Informed Latent HRTF Representations |
| Fu, Minghao |
P1-23 |
Benchmarking Sub-Genre Classification for Mainstage Dance Music |
| Fu, Yihui |
P3-7 |
EffDiffSE: Efficient Diffusion-Based Frequency-Domain Speech Enhancement with Hybrid Discriminative and Generative DNNs |
| Fuchs, Guillaume |
P3-12 |
UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension |
| Fuentes, Magdalena |
P1-21 |
Post-Training Quantization for Audio Diffusion Transformers |
P3-20 |
Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning |
| Fukayama, Satoru |
O2-4, P2-4 |
OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder |
| Gtop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Gao, Ruohan |
P3-5 |
Scene-wide Acoustic Parameter Estimation |
O1-5, P1-6 |
Towards Perception-Informed Latent HRTF Representations |
| Gaznepoglu, Ünal Ege |
P1-16 |
VoxATtack: A Multimodal Attack on Voice Anonymization Systems |
| Gerkmann, Timo |
O3-2, P2-21 |
Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance |
D-8 |
Demonstration of LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models Running on a Portable Device |
| Germain, François G. |
P4-11 |
FasTUSS: Faster Task-Aware Unified Source Separation |
O1-3, P1-3 |
Physics-Informed Direction-Aware Neural Acoustic Fields |
| Giurda, Ruksana |
O3-4, P2-23 |
Listening Intention Estimation for Hearables with Natural Behavior Cues |
| Glaus, Seraina |
O3-4, P2-23 |
Listening Intention Estimation for Hearables with Natural Behavior Cues |
| Gómez-Cañón, Juan S. |
P1-11 |
The Test of Auditory-Vocal Affect (TAVA) dataset |
| Götz, Georg |
D-6 |
Next-Generation Synthetic Data Techniques for Training, Evaluation, and Prototyping in Audio Signal Processing |
| Grinstein, Eric |
P4-8 |
Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network |
| Gröger, Fabian |
O3-4, P2-23 |
Listening Intention Estimation for Hearables with Natural Behavior Cues |
| Grundhuber, Philipp |
P2-15 |
Robust Speech Activity Detection in the Presence of Singing Voice |
| Guevara-Rukoz, Adriana |
P3-9 |
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability |
| Guo, Weizhe |
O2-1, P2-1 |
FlexSED: Towards Open-Vocabulary Sound Event Detection |
| Gupta, Kishan |
P3-12 |
UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension |
| Gusó, Enric |
P4-6 |
MB-RIRs: a Synthetic Room Impulse Response Dataset with Frequency-Dependent Absorption Coefficients |
P1-19 |
Conditional Wave-U-Net for Acoustic Matching in Shared XR Environments |
| Htop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Habets, Emanuël A. P. |
P3-3 |
Stereo Reproduction in the Presence of Sample Rate Offsets |
P1-16 |
VoxATtack: A Multimodal Attack on Voice Anonymization Systems |
P4-15 |
Device-Centric Room Impulse Response Augmentation Evaluated on Room Geometry Inference |
P2-15 |
Robust Speech Activity Detection in the Presence of Singing Voice |
| Hai, Jiarui |
O2-1, P2-1 |
FlexSED: Towards Open-Vocabulary Sound Event Detection |
P3-1 |
SynSonic: Augmenting Sound Event Detection through Text-to-Audio Diffusion ControlNet and Effective Sample Filtering |
| Halimeh, Mhd Modar |
P2-15 |
Robust Speech Activity Detection in the Presence of Singing Voice |
| Harju, Manu |
P1-8 |
Sound Event Detection with Audio-Text Models and Heterogeneous Temporal Annotations |
| Haruta, Chiho |
O4-4, P4-12 |
Context-Aware Query Refinement for Target Sound Extraction: Handling Partially Matched Queries |
| Hauret, Julien |
D-3 |
Real-Time Speech Enhancement in Noise for Throat Microphone Using Neural Audio Codec as Foundation Model |
| Heller, Laurie M. |
P1-17 |
Learning Perceptually Relevant Temporal Envelope Morphing |
| Hendriks, Richard C. |
P3-6 |
Cyclic Multichannel Wiener Filter for Acoustic Beamforming |
O4-1, P4-3 |
Beamforming with Interaural Time-To-Level Difference Conversion for Hearing Loss Compensation |
| Hennequin, Romain |
P1-18 |
Multi-Class-Token Transformer for Multitask Self-supervised Music Information Retrieval |
| Hershey, John R. |
P2-17 |
Source Separation by Flow Matching |
| Heusdens, Richard |
P3-6 |
Cyclic Multichannel Wiener Filter for Acoustic Beamforming |
O4-1, P4-3 |
Beamforming with Interaural Time-To-Level Difference Conversion for Hearing Loss Compensation |
| Higuchi, Takuya |
O3-1, P2-20 |
Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations |
| Hirano, Masato |
P2-13 |
Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement |
| Hiruma, Nobuhiko |
O4-4, P4-12 |
Context-Aware Query Refinement for Target Sound Extraction: Handling Partially Matched Queries |
| Hsieh, Tsun-An |
P2-19 |
TGIF: Talker Group-Informed Familiarization of Target Speaker Extraction |
| Hung, Yun-Ning (Amy) |
P4-17 |
Moises-Light: Resource-efficient Band-split U-Net For Music Source Separation |
P3-19 |
Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures |
| Itop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Ick, Christopher |
O1-3, P1-3 |
Physics-Informed Direction-Aware Neural Acoustic Fields |
| Imoto, Keisuke |
O4-4, P4-12 |
Context-Aware Query Refinement for Target Sound Extraction: Handling Partially Matched Queries |
| Iodice, Gian Marco |
P1-20 |
Fast Text-to-Audio Generation with Adversarial Post-Training |
| Ishikawa, Haruko |
O4-3, P4-9 |
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration |
P3-9 |
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability |
| Itou, Hiroaki |
O1-4, P1-5 |
Source and Sensor Placement for Sound Field Control Based on Mean Square Error with Prior Spatial Covariance |
| Itzhak, Gal |
O1-2, P1-2 |
Optimal Region-of-Interest Beamforming for Audio Conferencing with Dual Perpendicular Sparse Circular Sectors |
| Jtop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Jaffe, Noah |
P2-18 |
Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception |
| Jang, Inseon |
P1-15 |
Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine |
| Jensen, Jesper |
P4-2 |
Learning Robust Spatial Representations from Binaural Audio through Feature Distillation |
| Jiang, Hongyu |
P1-23 |
Benchmarking Sub-Genre Classification for Mainstage Dance Music |
| Jiang, Xilin |
O2-3, P2-3 |
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation |
| Jin, Zeyu |
P4-21 |
DiTVC: One-Shot Voice Conversion via Diffusion Transformer with Environment and Speaking Rate Cloning |
| Joubaud, Thomas |
D-3 |
Real-Time Speech Enhancement in Noise for Throat Microphone Using Neural Audio Codec as Foundation Model |
| Ktop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Kamado, Noriyoshi |
O1-4, P1-5 |
Source and Sensor Placement for Sound Field Control Based on Mean Square Error with Prior Spatial Covariance |
| Karita, Shigeki |
O4-3, P4-9 |
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration |
P3-9 |
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability |
| Khandelwal, Tanmay |
P1-21 |
Post-Training Quantization for Audio Diffusion Transformers |
| Khera, Harnick |
P3-17 |
Beyond Architecture: The Critical Impact of Inference Overlap on Music Source Separation Benchmarks |
| Kienegger, Jakob |
O3-2, P2-21 |
Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance |
| Kim, Minje |
P3-8 |
Adaptive Slimming for Scalable and Efficient Speech Enhancement |
P2-19 |
TGIF: Talker Group-Informed Familiarization of Target Speaker Extraction |
P1-15 |
Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine |
O4-5, P4-19 |
Combolutional Neural Networks |
| Klapuri, Anssi |
O2-6, P2-6 |
Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music |
| Kleijn, W. Bastiaan |
O4-2, P4-7 |
Robust One-step Speech Enhancement via Consistency Distillation |
| Koizumi, Yuma |
O4-3, P4-9 |
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration |
| Kong, Yuexuan |
P1-18 |
Multi-Class-Token Transformer for Multitask Self-supervised Music Information Retrieval |
| Koo, Junghyun |
P4-18 |
Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior |
P2-7 |
Can Large Language Models Predict Audio Effects Parameters from Natural Language? |
D-2 |
PCA-DiffVox: Augmenting Vocal Effects Tweakability With a Bijective Latent Space |
| Korse, Srikanth |
P3-3 |
Stereo Reproduction in the Presence of Sample Rate Offsets |
P3-12 |
UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension |
| Korzeniowski, Filip |
P4-17 |
Moises-Light: Resource-efficient Band-split U-Net For Music Source Separation |
| Koyama, Shoichi |
P4-23 |
Head-Related Transfer Function Individualization Using Anthropometric Features and Spatially Independent Latent Representation |
P4-5 |
Low-Rank Adaptation of Deep Prior Neural Networks For Room Impulse Response Reconstruction |
O1-4, P1-5 |
Source and Sensor Placement for Sound Field Control Based on Mean Square Error with Prior Spatial Covariance |
| Kozuka, Shihori |
O1-4, P1-5 |
Source and Sensor Placement for Sound Field Control Based on Mean Square Error with Prior Spatial Covariance |
| Kumar, Rithesh |
P4-21 |
DiTVC: One-Shot Voice Conversion via Diffusion Transformer with Environment and Speaking Rate Cloning |
| Kumar, Sonal |
P3-21 |
SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation |
| Kusaka, Yuta |
P2-9 |
Learn from Virtual Guitar: A Comparative Analysis of Automatic Guitar Transcription using Synthetic and Real Audio |
| Kuznetsova, Anastasia |
P1-15 |
Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine |
| Ltop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Lagrange, Mathieu |
P1-18 |
Multi-Class-Token Transformer for Multitask Self-supervised Music Information Retrieval |
| Laroche, Clement |
P3-8 |
Adaptive Slimming for Scalable and Efficient Speech Enhancement |
| Le Roux, Jonathan |
P4-11 |
FasTUSS: Faster Task-Aware Unified Source Separation |
O1-3, P1-3 |
Physics-Informed Direction-Aware Neural Acoustic Fields |
| Lee, Jung-Suk |
P1-13 |
Microphone Occlusion Mitigation for Own-Voice Enhancement in Head-Worn Microphone Arrays Using Switching-Adaptive Beamforming |
| Lee, Sanha |
P4-8 |
Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network |
| Li, Cole |
P4-8 |
Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network |
| Li, Guangzheng |
P3-11 |
Hybrid-Sep: Language-queried audio source separation via pre-trained Model Fusion and Adversarial Consistent Training |
| Li, Haizhou |
P4-10 |
Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study |
| Li, Henry |
P2-17 |
Source Separation by Flow Matching |
| Li, Linkai |
O3-5, P2-24 |
Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People |
| Li, Xinglin |
P1-23 |
Benchmarking Sub-Genre Classification for Mainstage Dance Music |
| Li, Xinyu |
P1-23 |
Benchmarking Sub-Genre Classification for Mainstage Dance Music |
| Liao, Wei-Hsiang |
P4-18 |
Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior |
P2-7 |
Can Large Language Models Predict Audio Effects Parameters from Natural Language? |
D-2 |
PCA-DiffVox: Augmenting Vocal Effects Tweakability With a Bijective Latent Space |
| Lim, Wootaek |
P1-15 |
Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine |
| Lionetti, Simone |
O3-4, P2-23 |
Listening Intention Estimation for Hearables with Natural Behavior Cues |
| Liu, Haocheng |
P3-2 |
Physically Informed Spatial Regularization for Sound Event Localization and Detection |
| Liu, Hexin |
P4-10 |
Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study |
| Lladó, Pedro |
P3-22 |
Perceptually-Driven Panning for an Extended Listening Area |
| Lostanlen, Vincent |
P1-18 |
Multi-Class-Token Transformer for Multitask Self-supervised Music Information Retrieval |
| Luan, Xinmeng |
P3-4 |
Physics-Informed Transfer Learning for Data-Driven Sound Source Reconstruction in Near-Field Acoustic Holography |
| Luberadzka, Joanna |
P4-6 |
MB-RIRs: a Synthetic Room Impulse Response Dataset with Frequency-Dependent Absorption Coefficients |
P1-19 |
Conditional Wave-U-Net for Acoustic Matching in Shared XR Environments |
| Lukoianov, Aleksandr |
O2-6, P2-6 |
Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music |
| Mtop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Ma, Teng |
D-4 |
Real-Time System for Audio-Visual Target Speech Enhancement |
| Maezawa, Akira |
P2-9 |
Learn from Virtual Guitar: A Comparative Analysis of Automatic Guitar Transcription using Synthetic and Real Audio |
| Mannanova, Alina |
O3-2, P2-21 |
Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance |
| Manocha, Dinesh |
P3-21 |
SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation |
| Martin, Rainer |
P3-13 |
Contrastive Representation Learning for Privacy-Preserving Fine-Tuning of Audio-Visual Speech Recognition |
| Martínez-Ramírez, Marco A. |
P4-18 |
Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior |
P2-7 |
Can Large Language Models Predict Audio Effects Parameters from Natural Language? |
D-2 |
PCA-DiffVox: Augmenting Vocal Effects Tweakability With a Bijective Latent Space |
| Masuyama, Yoshiki |
P4-11 |
FasTUSS: Faster Task-Aware Unified Source Separation |
O1-3, P1-3 |
Physics-Informed Direction-Aware Neural Acoustic Fields |
| Matsuda, Ryo |
O1-1, P1-1 |
Kernel Ridge Regression Based Sound Field Estimation using a Rigid Spherical Microphone Array |
| Mauch, Matthias |
P4-20 |
Self-Supervised Representation Learning with a JEPA Framework for Multi-instrument Music Transcription |
| Mawalim, Candy Olivia |
P4-16 |
Modeling Multi-Level Hearing Loss for Speech Intelligibility Prediction |
| McAuley, Julian |
P1-20 |
Fast Text-to-Audio Generation with Adversarial Post-Training |
| McPherson, Andrew |
D-1 |
Neural Audio Synthesis for Non-Keyboard Instruments |
| Mesaros, Annamaria |
P1-7 |
Online Incremental Learning for Audio Classification Using a Pretrained Audio Model |
P1-8 |
Sound Event Detection with Audio-Text Models and Heterogeneous Temporal Annotations |
| Meseguer-Brocal, Gabriel |
P1-18 |
Multi-Class-Token Transformer for Multitask Self-supervised Music Information Retrieval |
| Mesgarani, Nima |
O2-3, P2-3 |
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation |
| Miccini, Riccardo |
P3-8 |
Adaptive Slimming for Scalable and Efficient Speech Enhancement |
| Middelberg, Wiebke |
P1-13 |
Microphone Occlusion Mitigation for Own-Voice Enhancement in Head-Worn Microphone Arrays Using Switching-Adaptive Beamforming |
| Miotello, Federico |
P4-5 |
Low-Rank Adaptation of Deep Prior Neural Networks For Room Impulse Response Reconstruction |
| Mitcheltree, Christopher |
O2-5, P2-5 |
Modulation Discovery with Differentiable Digital Signal Processing |
| Mitsufuji, Yuki |
P4-18 |
Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior |
P2-10 |
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet |
P2-13 |
Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement |
P2-7 |
Can Large Language Models Predict Audio Effects Parameters from Natural Language? |
D-2 |
PCA-DiffVox: Augmenting Vocal Effects Tweakability With a Bijective Latent Space |
| Mo, Changgeng |
O3-5, P2-24 |
Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People |
| Möller, Sebastian |
P2-12 |
Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection |
| Mueckl, Gregor |
P3-5 |
Scene-wide Acoustic Parameter Estimation |
| Mulimani, Manjunath |
P1-7 |
Online Incremental Learning for Audio Classification Using a Pretrained Audio Model |
| Murata, Naoki |
P2-13 |
Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement |
| Ntop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Nakamura, Tomohiko |
P4-23 |
Head-Related Transfer Function Individualization Using Anthropometric Features and Spatially Independent Latent Representation |
| Nakata, Wataru |
P3-9 |
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability |
| Nam, Juhan |
P2-7 |
Can Large Language Models Predict Audio Effects Parameters from Natural Language? |
| Neidhardt, Annika |
P3-22 |
Perceptually-Driven Panning for an Extended Listening Area |
| Nielsen, Daniel Gert |
D-6 |
Next-Generation Synthetic Data Techniques for Training, Evaluation, and Prototyping in Audio Signal Processing |
| Nieto, Oriol |
P3-21 |
SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation |
| Nishigori, Shuichiro |
P2-13 |
Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement |
| Niu, Ryan |
P4-23 |
Head-Related Transfer Function Individualization Using Anthropometric Features and Spatially Independent Latent Representation |
| Noufi, Camille |
P1-11 |
The Test of Auditory-Vocal Affect (TAVA) dataset |
| Novack, Zachary |
P1-20 |
Fast Text-to-Audio Generation with Adversarial Post-Training |
| Nugraha, Aditya Arie |
P3-2 |
Physically Informed Spatial Regularization for Sound Event Localization and Detection |
| Otop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Okamoto, Takuma |
P3-24 |
SFC-L1: Sound Field Control With Least Absolute Deviation Regression |
| Oliveira, Danilo |
D-8 |
Demonstration of LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models Running on a Portable Device |
| Ono, Nobutaka |
P4-1 |
Incremental Averaging Method to Improve Graph-Based Time-Difference-of-Arrival Estimation |
| Østergaard, Jan |
P4-2 |
Learning Robust Spatial Representations from Binaural Audio through Feature Distillation |
| Ptop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Paissan, Francesco |
P4-11 |
FasTUSS: Faster Task-Aware Unified Source Separation |
| Pandey, Ashutosh |
P4-8 |
Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network |
P2-14 |
A Unified Framework for Evaluating DNN-Based Feedforward, Feedback, and Hybrid Active Noise Cancellation |
| Park, Sungjoon |
P1-17 |
Learning Perceptually Relevant Temporal Envelope Morphing |
| Parker, Julian |
P1-20 |
Fast Text-to-Audio Generation with Adversarial Post-Training |
| Parker, Karen J. |
P1-11 |
The Test of Auditory-Vocal Affect (TAVA) dataset |
| Passoni, Riccardo |
P1-22 |
Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models |
| Pauwels, Johan |
P3-17 |
Beyond Architecture: The Critical Impact of Inference Overlap on Music Source Separation Benchmarks |
| Peer, Tal |
D-8 |
Demonstration of LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models Running on a Portable Device |
| Pereira, Igor |
P4-17 |
Moises-Light: Resource-efficient Band-split U-Net For Music Source Separation |
P3-19 |
Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures |
| Pezzarossa, Luca |
P3-8 |
Adaptive Slimming for Scalable and Efficient Speech Enhancement |
| Pezzoli, Mirco |
P3-4 |
Physics-Informed Transfer Learning for Data-Driven Sound Source Reconstruction in Near-Field Acoustic Holography |
P4-5 |
Low-Rank Adaptation of Deep Prior Neural Networks For Room Impulse Response Reconstruction |
| Pia, Nicola |
P3-12 |
UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension |
| Pieper, Jaden |
P2-16 |
Frequency-Domain Signal-to-Noise Ratios Illuminate the Effects of the Spectral Consistency Constraint and Griffin-Lim Algorithms |
| Pilataki, Mary |
P4-20 |
Self-Supervised Representation Learning with a JEPA Framework for Multi-instrument Music Transcription |
| Pind, Finnur |
D-6 |
Next-Generation Synthetic Data Techniques for Training, Evaluation, and Prototyping in Audio Signal Processing |
| Plaja-Roglans, Genís |
P3-19 |
Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures |
| Plumbley, Mark D. |
P3-18 |
Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models |
D-7 |
Speech Removal Framework for Privacy-preserving Audio Recordings |
| Politis, Archontis |
P1-14 |
Multi-Utterance Speech Separation and Association Trained on Short Segments |
| Polzehl, Tim |
P2-12 |
Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection |
| Pons, Jordi |
P1-20 |
Fast Text-to-Audio Generation with Adversarial Post-Training |
| Primus, Paul |
O2-2, P2-2 |
TACOS: Temporally-Aligned Audio Captions for Language-Audio Pretraining |
| Rtop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Reiss, Joshua D. |
O2-5, P2-5 |
Modulation Discovery with Differentiable Digital Signal Processing |
| Ribeiro, Juliano G. C. |
O1-1, P1-1 |
Kernel Ridge Regression Based Sound Field Estimation using a Rigid Spherical Microphone Array |
| Richard, Gaël |
P3-2 |
Physically Informed Spatial Regularization for Sound Event Localization and Detection |
| Richter, Julius |
D-8 |
Demonstration of LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models Running on a Portable Device |
| Ritter-Gutierrez, Fabian |
P2-12 |
Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection |
| Roden, Reinhild |
P4-4 |
Soft-Constrained Spatially Selective Active Noise Control for Open-fitting Hearables |
| Roman, Adrian S. |
P1-9 |
Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach |
| Roman, Iran R. |
P1-9 |
Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach |
| Ronchini, Francesca |
P1-22 |
Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models |
| Rowe, David |
P4-13 |
RADE: A Neural Codec for Transmitting Speech over HF Radio Channels |
| Roy Choudhury, Romit |
P1-12 |
Unsupervised Multi-channel Speech Dereverberation via Diffusion |
| Stop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Sach, Marvin |
P3-7 |
EffDiffSE: Efficient Diffusion-Based Frequency-Domain Speech Enhancement with Hybrid Discriminative and Generative DNNs |
| Saijo, Kohei |
P4-11 |
FasTUSS: Faster Task-Aware Unified Source Separation |
P3-16 |
Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation |
| Saito, Koichi |
P2-13 |
Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement |
| Salamon, Justin |
P3-21 |
SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation |
| Sandler, Mark B. |
P3-17 |
Beyond Architecture: The Critical Impact of Inference Overlap on Music Source Separation Benchmarks |
D-1 |
Neural Audio Synthesis for Non-Keyboard Instruments |
| Sarti, Augusto |
P3-4 |
Physics-Informed Transfer Learning for Data-Driven Sound Source Reconstruction in Near-Field Acoustic Holography |
| Sato, Ryo |
O4-4, P4-12 |
Context-Aware Query Refinement for Target Sound Extraction: Handling Partially Matched Queries |
| Sayin, Umut |
P4-6 |
MB-RIRs: a Synthetic Room Impulse Response Dataset with Frequency-Dependent Absorption Coefficients |
P1-19 |
Conditional Wave-U-Net for Acoustic Matching in Shared XR Environments |
| Scheibler, Robin |
O4-3, P4-9 |
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration |
P3-9 |
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability |
P2-17 |
Source Separation by Flow Matching |
| Schlecht, Sebastian J. |
P2-11 |
Neural-Network-Based Interpolation of Late Reverberation in Coupled Spaces Using the Common Slopes Model |
| Schmid, Florian |
O2-2, P2-2 |
TACOS: Temporally-Aligned Audio Captions for Language-Audio Pretraining |
| Seetharaman, Prem |
P3-21 |
SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation |
| Serizel, Romain |
P1-22 |
Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models |
| Serra, Xavier |
P4-6 |
MB-RIRs: a Synthetic Room Impulse Response Dataset with Frequency-Dependent Absorption Coefficients |
P3-19 |
Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures |
| Shi, Renzheng |
P3-7 |
EffDiffSE: Efficient Diffusion-Based Frequency-Domain Speech Enhancement with Hybrid Discriminative and Generative DNNs |
| Shim, Hye-jin |
O2-4, P2-4 |
OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder |
| Shu, Hongzhi |
P1-23 |
Benchmarking Sub-Genre Classification for Mainstage Dance Music |
| Simón Gálvez, Marcos F. |
P3-23 |
Theoretical Analysis of Recursive Implementations of Multi-Channel Cross-Talk Cancellation Systems |
| Singh, Arshdeep |
D-7 |
Speech Removal Framework for Privacy-preserving Audio Recordings |
| Smaragdis, Paris |
P3-8 |
Adaptive Slimming for Scalable and Efficient Speech Enhancement |
O3-1, P2-20 |
Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations |
P4-22 |
Learning to Upsample and Upmix Audio in the Latent Domain |
O4-5, P4-19 |
Combolutional Neural Networks |
| Song, Zeyang |
P4-10 |
Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study |
| Sontacchi, Alois |
P3-18 |
Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models |
| Souchaud, Antoine R. |
P3-22 |
Perceptually-Driven Panning for an Extended Listening Area |
| Souden, Mehrez |
O3-1, P2-20 |
Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations |
| Srinivas, Shanmukha |
P4-8 |
Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network |
| Stahl, Benjamin |
P3-18 |
Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models |
| Stamatiadis, Paraskevas |
P3-10 |
IS³ : Generic Impulsive–Stationary Sound Separation in Acoustic Scenes using Deep Filtering |
| Strauß, Martin |
P2-15 |
Robust Speech Activity Detection in the Presence of Singing Voice |
| Su, Jiaqi |
P4-21 |
DiTVC: One-Shot Voice Conversion via Diffusion Transformer with Environment and Speaking Rate Cloning |
| Subramani, Krishna |
O3-1, P2-20 |
Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations |
| Ttop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Takahashi, Akira |
P2-10 |
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet |
| Takahashi, Shusuke |
P2-10 |
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet |
P2-13 |
Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement |
| Tan, Hao Hao |
O2-5, P2-5 |
Modulation Discovery with Differentiable Digital Signal Processing |
| Tan, Zheng-Hua |
P4-2 |
Learning Robust Spatial Representations from Binaural Audio through Feature Distillation |
| Taylor, Josiah |
P1-20 |
Fast Text-to-Audio Generation with Adversarial Post-Training |
| Tenbrinck, Daniel |
P1-16 |
VoxATtack: A Multimodal Attack on Voice Anonymization Systems |
| Tirry, Wouter |
P3-7 |
EffDiffSE: Efficient Diffusion-Based Frequency-Domain Speech Enhancement with Hybrid Discriminative and Generative DNNs |
| Töpken, Stephan |
O4-1, P4-3 |
Beamforming with Interaural Time-To-Level Difference Conversion for Hearing Loss Compensation |
| Tourbabin, Vladimir |
P1-13 |
Microphone Occlusion Mitigation for Own-Voice Enhancement in Head-Worn Microphone Arrays Using Switching-Adaptive Beamforming |
| Toyama, Keisuke |
P2-10 |
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet |
| Trevino, Jorge |
O1-1, P1-1 |
Kernel Ridge Regression Based Sound Field Estimation using a Rigid Spherical Microphone Array |
| Tuna, Cagdas |
P4-15 |
Device-Centric Room Impulse Response Augmentation Evaluated on Room Geometry Inference |
| Utop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Unoki, Masashi |
P4-16 |
Modeling Multi-Level Hearing Loss for Speech Intelligibility Prediction |
| Vtop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Vaidyanathapuram Krishnan, Venkatakrishnan |
P4-14 |
The Perception of Phase Intercept Distortion and its Application in Data Augmentation |
| Valin, Jean-Marc |
P4-13 |
RADE: A Neural Codec for Transmitting Speech over HF Radio Channels |
O3-3, P2-22 |
A Lightweight and Robust Method for Blind Wideband-to-Fullband Extension of Speech |
| van de Par, Steven |
O4-1, P4-3 |
Beamforming with Interaural Time-To-Level Difference Conversion for Hearing Loss Compensation |
| Verhulst, Sarah |
P3-14 |
Low-Complexity Individualized Noise Reduction for Real-Time Processing |
| Veronesi, Francesco |
P3-23 |
Theoretical Analysis of Recursive Implementations of Multi-Channel Cross-Talk Cancellation Systems |
| Virtanen, Tuomas |
P1-14 |
Multi-Utterance Speech Separation and Association Trained on Short Segments |
| Voran, Stephen |
P2-16 |
Frequency-Domain Signal-to-Noise Ratios Illuminate the Effects of the Spectral Consistency Constraint and Griffin-Lim Algorithms |
| Wtop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Walther, Andreas |
P3-3 |
Stereo Reproduction in the Presence of Sample Rate Offsets |
P4-15 |
Device-Centric Room Impulse Response Augmentation Evaluated on Room Geometry Inference |
| Wang, Deliang |
P2-14 |
A Unified Framework for Evaluating DNN-Based Feedforward, Feedback, and Hybrid Active Noise Cancellation |
| Wang, Helin |
O2-1, P2-1 |
FlexSED: Towards Open-Vocabulary Sound Event Detection |
| Wang, Ju-Chiang |
P2-8 |
Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis |
| Wang, Shan Xiang |
O3-5, P2-24 |
Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People |
| Wang, Yunyun |
P4-21 |
DiTVC: One-Shot Voice Conversion via Diffusion Transformer with Environment and Speaking Rate Cloning |
| Wang, Yuzhu |
P1-14 |
Multi-Utterance Speech Separation and Association Trained on Short Segments |
| Wang, Zhong-Qiu |
P1-12 |
Unsupervised Multi-channel Speech Dereverberation via Diffusion |
| Watanabe, Shinji |
O2-4, P2-4 |
OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder |
P4-2 |
Learning Robust Spatial Representations from Binaural Audio through Feature Distillation |
| Wen, Chuan |
P3-14 |
Low-Complexity Individualized Noise Reduction for Real-Time Processing |
| Wichern, Gordon |
P4-11 |
FasTUSS: Faster Task-Aware Unified Source Separation |
O1-3, P1-3 |
Physics-Informed Direction-Aware Neural Acoustic Fields |
| Widmer, Gerhard |
O2-2, P2-2 |
TACOS: Temporally-Aligned Audio Captions for Language-Audio Pretraining |
| Wilkins, Julia |
P3-20 |
Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning |
| Williamson, Donald |
P3-15 |
JSQA: Speech Quality Assessment with Perceptually-Inspired Contrastive Pretraining Based on JND Audio Pairs |
| Wong, Daniel D. E. |
P2-14 |
A Unified Framework for Evaluating DNN-Based Feedforward, Feedback, and Hybrid Active Noise Cancellation |
P1-13 |
Microphone Occlusion Mitigation for Own-Voice Enhancement in Head-Worn Microphone Arrays Using Switching-Adaptive Beamforming |
| Wu, Junkai |
O2-3, P2-3 |
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation |
| Wu, Yulun |
P1-12 |
Unsupervised Multi-channel Speech Dereverberation via Diffusion |
| Xtop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Xiao, Tong |
P4-4 |
Soft-Constrained Spatially Selective Active Noise Control for Open-fitting Hearables |
| Xu, Buye |
P2-14 |
A Unified Framework for Evaluating DNN-Based Feedforward, Feedback, and Hybrid Active Noise Cancellation |
| Xu, Liang |
O4-2, P4-7 |
Robust One-step Speech Enhancement via Consistency Distillation |
| Xu, Yangfei |
P3-11 |
Hybrid-Sep: Language-queried audio source separation via pre-trained Model Fusion and Adversarial Consistent Training |
| Xu, Zhongweiyang |
P1-12 |
Unsupervised Multi-channel Speech Dereverberation via Diffusion |
| Ytop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Yamaoka, Kouei |
P4-1 |
Incremental Averaging Method to Improve Graph-Based Time-Difference-of-Arrival Estimation |
| Yan, Longfei Felix |
O4-2, P4-7 |
Robust One-step Speech Enhancement via Consistency Distillation |
| Yang, Gene-Ping |
P1-10 |
Distributed Asynchronous Device Speech Enhancement via Windowed Cross-Attention |
| Yang, Li-Chia |
D-4 |
Real-Time System for Audio-Visual Target Speech Enhancement |
| Yeh, Chunghsin |
P1-4 |
Room Impulse Response Generation Conditioned on Acoustic Parameters |
| Yin, Sile |
D-4 |
Real-Time System for Audio-Visual Target Speech Enhancement |
| Yoshii, Kazuyoshi |
P3-2 |
Physically Informed Spatial Regularization for Sound Event Localization and Detection |
| Yu, Chin-Yun |
P4-18 |
Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior |
D-2 |
PCA-DiffVox: Augmenting Vocal Effects Tweakability With a Bijective Latent Space |
| Yuma, Koizumi |
P3-9 |
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability |
| Ztop A B C D E F G H I J K L M N O P R S T U V W X Y Z |
| Zen, Heiga |
O4-3, P4-9 |
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration |
P3-9 |
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability |
| Zhang, Qiquan |
P4-10 |
Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study |
| Zhang, Shuo |
D-4 |
Real-Time System for Audio-Visual Target Speech Enhancement |
| Zhang, Xiangyu |
P4-10 |
Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study |
| Zhang, Yixiao |
P2-8 |
Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis |
| Zhang, You |
O1-5, P1-6 |
Towards Perception-Informed Latent HRTF Representations |
| Zhong, Zhi |
P2-10 |
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet |
| Zhou, Haoshuai |
O3-5, P2-24 |
Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People |
| Zhou, Xiajie |
P4-16 |
Modeling Multi-Level Hearing Loss for Speech Intelligibility Prediction |
| Zukowski, Zack |
P1-20 |
Fast Text-to-Audio Generation with Adversarial Post-Training |