Complex Recurrent Variational Autoencoder for Speech Resynthesis and Enhancement

Yuying Xie*, Thomas Arildsen, Zheng Hua Tan

*Corresponding author for this work

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

Abstract

Aiming at learning a probabilistic distribution over data, generative models have been actively studied with broad applications. This paper proposes a complex recurrent variational autoencoder (VAE) framework, for modeling time series data, particularly speech signals. First, to account for the temporal structure of speech signals, we introduce complex-valued recurrent neural network in the framework. Then, inspired by recent advancements in speech enhancement and separation, the reconstruction loss in the proposed model is L1-based loss, considering penalty on both complex and magnitude spectrograms. To exemplify the use of the complex generative model, we choose speech resynthesis first and then enhancement as the specific application in this paper. Experiments are conducted on the VCTK, TIMIT, and VoiceBank+DEMAND datasets. The results show that the proposed method can resynthesize complex spectrogram well, and offers improvements on objective metrics in speech intelligibility and signal quality for enhancement.

Original languageEnglish
Title of host publication2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings
PublisherIEEE (Institute of Electrical and Electronics Engineers)
Publication date2024
Article number10650194
ISBN (Print)979-8-3503-5932-9
ISBN (Electronic)979-8-3503-5931-2
DOIs
Publication statusPublished - 2024
Event2024 International Joint Conference on Neural Networks, IJCNN 2024 - Yokohama, Japan
Duration: 30 Jun 20245 Jul 2024

Conference

Conference2024 International Joint Conference on Neural Networks, IJCNN 2024
Country/TerritoryJapan
CityYokohama
Period30/06/202405/07/2024
SponsorAsk Corporation, et al., IEEE, IEEE Computational Intelligence Society, International Neural Network Society, Science Council of Japan
SeriesInternational Joint Conference on Neural Networks (IJCNN)
ISSN2161-4407

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Keywords

  • complex recurrent neural network
  • speech enhancement
  • speech resynthesis
  • variational autoencoder

Fingerprint

Dive into the research topics of 'Complex Recurrent Variational Autoencoder for Speech Resynthesis and Enhancement'. Together they form a unique fingerprint.

Cite this