Abstract
Aiming at learning a probabilistic distribution over data, generative models have been actively studied with broad applications. This paper proposes a complex recurrent variational autoencoder (VAE) framework, for modeling time series data, particularly speech signals. First, to account for the temporal structure of speech signals, we introduce complex-valued recurrent neural network in the framework. Then, inspired by recent advancements in speech enhancement and separation, the reconstruction loss in the proposed model is L1-based loss, considering penalty on both complex and magnitude spectrograms. To exemplify the use of the complex generative model, we choose speech resynthesis first and then enhancement as the specific application in this paper. Experiments are conducted on the VCTK, TIMIT, and VoiceBank+DEMAND datasets. The results show that the proposed method can resynthesize complex spectrogram well, and offers improvements on objective metrics in speech intelligibility and signal quality for enhancement.
Original language | English |
---|---|
Title of host publication | 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings |
Publisher | IEEE (Institute of Electrical and Electronics Engineers) |
Publication date | 2024 |
Article number | 10650194 |
ISBN (Print) | 979-8-3503-5932-9 |
ISBN (Electronic) | 979-8-3503-5931-2 |
DOIs | |
Publication status | Published - 2024 |
Event | 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Yokohama, Japan Duration: 30 Jun 2024 → 5 Jul 2024 |
Conference
Conference | 2024 International Joint Conference on Neural Networks, IJCNN 2024 |
---|---|
Country/Territory | Japan |
City | Yokohama |
Period | 30/06/2024 → 05/07/2024 |
Sponsor | Ask Corporation, et al., IEEE, IEEE Computational Intelligence Society, International Neural Network Society, Science Council of Japan |
Series | International Joint Conference on Neural Networks (IJCNN) |
---|---|
ISSN | 2161-4407 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Keywords
- complex recurrent neural network
- speech enhancement
- speech resynthesis
- variational autoencoder