A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence

Yang Xiang; Liming Shi; Jesper Lisby Højvang; Morten Højfeldt Rasmussen; Mads Græsbøll Christensen

doi:10.1186/s13636-022-00256-5

A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence

Yang Xiang^*, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

^*Corresponding author for this work

Research output: Contribution to journal › Journal article › Research › peer-review

30 Downloads (Pure)

Abstract

In this paper, we propose a supervised single-channel speech enhancement method that combines Kullback-Leibler (KL) divergence-based non-negative matrix factorization (NMF) and a hidden Markov model (NMF-HMM). With the integration of the HMM, the temporal dynamics information of speech signals can be taken into account. This method includes a training stage and an enhancement stage. In the training stage, the sum of the Poisson distribution, leading to the KL divergence measure, is used as the observation model for each state of the HMM. This ensures that a computationally efficient multiplicative update can be used for the parameter update of this model. In the online enhancement stage, a novel minimum mean square error estimator is proposed for the NMF-HMM. This estimator can be implemented using parallel computing, reducing the time complexity. Moreover, compared to the traditional NMF-based speech enhancement methods, the experimental results show that our proposed algorithm improved the short-time objective intelligibility and perceptual evaluation of speech quality by 5% and 0.18, respectively.

Original language	English
Article number	22
Journal	Eurasip Journal on Audio, Speech, and Music Processing
Volume	2022
Issue number	1
ISSN	1687-4714
DOIs	https://doi.org/10.1186/s13636-022-00256-5
Publication status	Published - 8 Sept 2022

Bibliographical note

Funding Information:
This work was supported by Innovation Fund Denmark (Grant No.9065-00046).

Publisher Copyright:
© 2022, The Author(s).

Keywords

Hidden Markov model
Kullback-Leibler divergence
Minimum mean-square error
Non-negative matrix factorization
Speech enhancement

Access to Document

10.1186/s13636-022-00256-5Licence: CC BY 4.0

Open Access articleFinal published version, 3.16 MBLicence: CC BY 4.0

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{a59106d3620b42ba9cf3d95be2c85ed7,

title = "A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence",

abstract = "In this paper, we propose a supervised single-channel speech enhancement method that combines Kullback-Leibler (KL) divergence-based non-negative matrix factorization (NMF) and a hidden Markov model (NMF-HMM). With the integration of the HMM, the temporal dynamics information of speech signals can be taken into account. This method includes a training stage and an enhancement stage. In the training stage, the sum of the Poisson distribution, leading to the KL divergence measure, is used as the observation model for each state of the HMM. This ensures that a computationally efficient multiplicative update can be used for the parameter update of this model. In the online enhancement stage, a novel minimum mean square error estimator is proposed for the NMF-HMM. This estimator can be implemented using parallel computing, reducing the time complexity. Moreover, compared to the traditional NMF-based speech enhancement methods, the experimental results show that our proposed algorithm improved the short-time objective intelligibility and perceptual evaluation of speech quality by 5% and 0.18, respectively.",

keywords = "Hidden Markov model, Kullback-Leibler divergence, Minimum mean-square error, Non-negative matrix factorization, Speech enhancement",

author = "Yang Xiang and Liming Shi and H{\o}jvang, {Jesper Lisby} and Rasmussen, {Morten H{\o}jfeldt} and Christensen, {Mads Gr{\ae}sb{\o}ll}",

note = "Funding Information: This work was supported by Innovation Fund Denmark (Grant No.9065-00046). Publisher Copyright: {\textcopyright} 2022, The Author(s).",

year = "2022",

month = sep,

day = "8",

doi = "10.1186/s13636-022-00256-5",

language = "English",

volume = "2022",

journal = "Eurasip Journal on Audio, Speech, and Music Processing",

issn = "1687-4714",

publisher = "Springer",

number = "1",

}

TY - JOUR

T1 - A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence

AU - Xiang, Yang

AU - Shi, Liming

AU - Højvang, Jesper Lisby

AU - Rasmussen, Morten Højfeldt

AU - Christensen, Mads Græsbøll

PY - 2022/9/8

Y1 - 2022/9/8

N2 - In this paper, we propose a supervised single-channel speech enhancement method that combines Kullback-Leibler (KL) divergence-based non-negative matrix factorization (NMF) and a hidden Markov model (NMF-HMM). With the integration of the HMM, the temporal dynamics information of speech signals can be taken into account. This method includes a training stage and an enhancement stage. In the training stage, the sum of the Poisson distribution, leading to the KL divergence measure, is used as the observation model for each state of the HMM. This ensures that a computationally efficient multiplicative update can be used for the parameter update of this model. In the online enhancement stage, a novel minimum mean square error estimator is proposed for the NMF-HMM. This estimator can be implemented using parallel computing, reducing the time complexity. Moreover, compared to the traditional NMF-based speech enhancement methods, the experimental results show that our proposed algorithm improved the short-time objective intelligibility and perceptual evaluation of speech quality by 5% and 0.18, respectively.

AB - In this paper, we propose a supervised single-channel speech enhancement method that combines Kullback-Leibler (KL) divergence-based non-negative matrix factorization (NMF) and a hidden Markov model (NMF-HMM). With the integration of the HMM, the temporal dynamics information of speech signals can be taken into account. This method includes a training stage and an enhancement stage. In the training stage, the sum of the Poisson distribution, leading to the KL divergence measure, is used as the observation model for each state of the HMM. This ensures that a computationally efficient multiplicative update can be used for the parameter update of this model. In the online enhancement stage, a novel minimum mean square error estimator is proposed for the NMF-HMM. This estimator can be implemented using parallel computing, reducing the time complexity. Moreover, compared to the traditional NMF-based speech enhancement methods, the experimental results show that our proposed algorithm improved the short-time objective intelligibility and perceptual evaluation of speech quality by 5% and 0.18, respectively.

KW - Hidden Markov model

KW - Kullback-Leibler divergence

KW - Minimum mean-square error

KW - Non-negative matrix factorization

KW - Speech enhancement

UR - http://www.scopus.com/inward/record.url?scp=85138113195&partnerID=8YFLogxK

U2 - 10.1186/s13636-022-00256-5

DO - 10.1186/s13636-022-00256-5

M3 - Journal article

AN - SCOPUS:85138113195

SN - 1687-4714

VL - 2022

JO - Eurasip Journal on Audio, Speech, and Music Processing

JF - Eurasip Journal on Audio, Speech, and Music Processing

IS - 1

M1 - 22

ER -

A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence

Abstract

Bibliographical note

Keywords

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this