A Bayesian Permutation Training Deep Representation Learning Method for Speech Enhancement with Variational Autoencoder

Yang Xiang*, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

*Corresponding author for this work

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

3 Citations (Scopus)

Abstract

Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these VAE-based methods cannot disentangle the speech and noise latent variables from the observed signal. Based on Bayesian theory, this paper derives a novel variational lower bound for VAE, which ensures that VAE can be trained in supervision, and can disentangle speech and noise latent variables from the observed signal. This means that the proposed method can apply the VAE to model both speech and noise signals, which is totally different from the previous VAE-based SE works. More specifically, the proposed DRL method can learn to impose speech and noise signal priors to different sets of latent variables for SE. The experimental results show that the proposed method can not only disentangle speech and noise latent variables from the observed signal, but also obtain a higher scale-invariant signal-to-distortion ratio and speech quality score than the similar deep neural network-based (DNN) SE method.

Original languageEnglish
Title of host publicationICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Number of pages5
PublisherIEEE
Publication date2022
Pages381-385
ISBN (Print)978-1-6654-0541-6
ISBN (Electronic)978-1-6654-0540-9
DOIs
Publication statusPublished - 2022
Event47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore
Duration: 23 May 202227 May 2022

Conference

Conference47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Country/TerritorySingapore
CityVirtual, Online
Period23/05/202227/05/2022
SponsorChinese and Oriental Languages Information Processing Society (COLPIS), Singapore Exhibition and Convention Bureau, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), The Institute of Electrical and Electronics Engineers Signal Processing Society
SeriesICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN1520-6149

Bibliographical note

Funding Information:
This work was partly supported by Innovation Fund Denmark (Grant No.9065-00046).

Publisher Copyright:
© 2022 IEEE

Keywords

  • Bayesian permutation training
  • Deep representation learning
  • speech enhancement
  • variational autoencoder

Fingerprint

Dive into the research topics of 'A Bayesian Permutation Training Deep Representation Learning Method for Speech Enhancement with Variational Autoencoder'. Together they form a unique fingerprint.

Cite this