Projekter pr. år
Abstract
While attention-based architectures, such as Conformers, excel in speech enhancement, they face challenges such as scalability with respect to input sequence length. In contrast, the recently proposed Extended Long Short-Term Memory (xLSTM) architecture offers linear scalability. However, xLSTM-based models remain unexplored for speech enhancement. This paper introduces xLSTM-SENet, the first xLSTM-based single-channel speech enhancement system. A comparative analysis reveals that xLSTM-and notably, even LSTM-can match or outperform state-of-the-art Mamba- and Conformer-based systems across various model sizes in speech enhancement on the VoiceBank+Demand dataset. Through ablation studies, we identify key architectural design choices such as exponential gating and bidirectionality contributing to its effectiveness. Our best xLSTM-based model, xLSTM-SENet2, outperforms state-of-the-art Mamba- and Conformer-based systems on the Voicebank+DEMAND dataset.
Originalsprog | Engelsk |
---|---|
Antal sider | 6 |
Status | Udgivet - 10 jan. 2025 |
Fingeraftryk
Dyk ned i forskningsemnerne om 'xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement'. Sammen danner de et unikt fingeraftryk.Projekter
- 2 Igangværende
-
Speech Enhancement using Sequence Modelling Neural Architectures and Spoken Large Language Models
Kühne, N. (PI (principal investigator)), Tan, Z.-H. (Supervisor), Østergaard, J. (Supervisor) & Jensen, J. (Supervisor)
02/09/2024 → 31/08/2028
Projekter: Projekt › Ph.d.-projekt
-
CASPR: Centre for Acoustic Signal Processing Research
Østergaard, J. (PI (principal investigator)), Tan, Z.-H. (PI (principal investigator)) & Jensen, J. (PI (principal investigator))
01/11/2016 → …
Projekter: Projekt › Forskning