Complex Recurrent Variational Autoencoder for Speech Resynthesis and Enhancement

Yuying Xie*, Thomas Arildsen, Zheng Hua Tan

*Kontaktforfatter

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

Abstract

Aiming at learning a probabilistic distribution over data, generative models have been actively studied with broad applications. This paper proposes a complex recurrent variational autoencoder (VAE) framework, for modeling time series data, particularly speech signals. First, to account for the temporal structure of speech signals, we introduce complex-valued recurrent neural network in the framework. Then, inspired by recent advancements in speech enhancement and separation, the reconstruction loss in the proposed model is L1-based loss, considering penalty on both complex and magnitude spectrograms. To exemplify the use of the complex generative model, we choose speech resynthesis first and then enhancement as the specific application in this paper. Experiments are conducted on the VCTK, TIMIT, and VoiceBank+DEMAND datasets. The results show that the proposed method can resynthesize complex spectrogram well, and offers improvements on objective metrics in speech intelligibility and signal quality for enhancement.

OriginalsprogEngelsk
Titel2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings
ForlagIEEE (Institute of Electrical and Electronics Engineers)
Publikationsdato2024
Artikelnummer10650194
ISBN (Trykt)979-8-3503-5932-9
ISBN (Elektronisk)979-8-3503-5931-2
DOI
StatusUdgivet - 2024
Begivenhed2024 International Joint Conference on Neural Networks, IJCNN 2024 - Yokohama, Japan
Varighed: 30 jun. 20245 jul. 2024

Konference

Konference2024 International Joint Conference on Neural Networks, IJCNN 2024
Land/OmrådeJapan
ByYokohama
Periode30/06/202405/07/2024
SponsorAsk Corporation, et al., IEEE, IEEE Computational Intelligence Society, International Neural Network Society, Science Council of Japan
NavnInternational Joint Conference on Neural Networks (IJCNN)
ISSN2161-4407

Bibliografisk note

Publisher Copyright:
© 2024 IEEE.

Fingeraftryk

Dyk ned i forskningsemnerne om 'Complex Recurrent Variational Autoencoder for Speech Resynthesis and Enhancement'. Sammen danner de et unikt fingeraftryk.

Citationsformater