Abstract
Aiming at learning a probabilistic distribution over data, generative models have been actively studied with broad applications. This paper proposes a complex recurrent variational autoencoder (VAE) framework, for modeling time series data, particularly speech signals. First, to account for the temporal structure of speech signals, we introduce complex-valued recurrent neural network in the framework. Then, inspired by recent advancements in speech enhancement and separation, the reconstruction loss in the proposed model is L1-based loss, considering penalty on both complex and magnitude spectrograms. To exemplify the use of the complex generative model, we choose speech resynthesis first and then enhancement as the specific application in this paper. Experiments are conducted on the VCTK, TIMIT, and VoiceBank+DEMAND datasets. The results show that the proposed method can resynthesize complex spectrogram well, and offers improvements on objective metrics in speech intelligibility and signal quality for enhancement.
Originalsprog | Engelsk |
---|---|
Titel | 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings |
Forlag | IEEE (Institute of Electrical and Electronics Engineers) |
Publikationsdato | 2024 |
Artikelnummer | 10650194 |
ISBN (Trykt) | 979-8-3503-5932-9 |
ISBN (Elektronisk) | 979-8-3503-5931-2 |
DOI | |
Status | Udgivet - 2024 |
Begivenhed | 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Yokohama, Japan Varighed: 30 jun. 2024 → 5 jul. 2024 |
Konference
Konference | 2024 International Joint Conference on Neural Networks, IJCNN 2024 |
---|---|
Land/Område | Japan |
By | Yokohama |
Periode | 30/06/2024 → 05/07/2024 |
Sponsor | Ask Corporation, et al., IEEE, IEEE Computational Intelligence Society, International Neural Network Society, Science Council of Japan |
Navn | International Joint Conference on Neural Networks (IJCNN) |
---|---|
ISSN | 2161-4407 |
Bibliografisk note
Publisher Copyright:© 2024 IEEE.