Online Multichannel Speech Enhancement Based on Recursive EM and DNN-based Speech Presence Estimation

Juan M. Martín-Doñas; Jesper Jensen; Zheng-Hua Tan; Angel Gomez; Antonio Peinado

doi:10.1109/TASLP.2020.3036776

Online Multichannel Speech Enhancement Based on Recursive EM and DNN-based Speech Presence Estimation

Juan M. Martín-Doñas, Jesper Jensen, Zheng-Hua Tan, Angel Gomez, Antonio Peinado

Research output: Contribution to journal › Journal article › Research › peer-review

11 Citations (Scopus)

215 Downloads (Pure)

Abstract

This article presents a recursive expectation-maximization algorithm for online multichannel speech enhancement. A deep neural network mask estimator is used to compute the speech presence probability, which is then improved by means of statistical spatial models of the noisy speech and noise signals. The clean speech signal is estimated using beamforming, single-channel linear postfiltering and speech presence masking. The clean speech statistics and speech presence probabilities are finally used to compute the acoustic parameters for beamforming and postfiltering by means of maximum likelihood estimation. This iterative procedure is carried out on a frame-by-frame basis. The algorithm integrates the different estimates in a common statistical framework suitable for online scenarios. Moreover, our method can successfully exploit spectral, spatial and temporal speech properties. Our proposed algorithm is tested in different noisy environments using the multichannel recordings of the CHiME-4 database. The experimental results show that our method outperforms other related state-of-the-art approaches in noise reduction performance, while allowing low-latency processing for real-time applications.

Original language	English
Article number	9252844
Journal	IEEE/ACM Transactions on Audio, Speech, and Language Processing
Volume	28
Pages (from-to)	3080-3094
Number of pages	15
ISSN	2329-9290
DOIs	https://doi.org/10.1109/TASLP.2020.3036776
Publication status	Published - Dec 2020

Keywords

Acoustics
Array signal processing
Computational modeling
Estimation
Kalman filter
Noise measurement
Recursive expectation-maximization
Speech enhancement
deep neural networks
multichannel speech enhancement
speech presence probability
recursive expectation-maximization
Deep neural networks

Access to Document

10.1109/TASLP.2020.3036776

Accepted manuscriptAccepted author manuscript, 1.87 MB

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{1357755664334324a69c7167ca6b7e2a,

title = "Online Multichannel Speech Enhancement Based on Recursive EM and DNN-based Speech Presence Estimation",

abstract = "This article presents a recursive expectation-maximization algorithm for online multichannel speech enhancement. A deep neural network mask estimator is used to compute the speech presence probability, which is then improved by means of statistical spatial models of the noisy speech and noise signals. The clean speech signal is estimated using beamforming, single-channel linear postfiltering and speech presence masking. The clean speech statistics and speech presence probabilities are finally used to compute the acoustic parameters for beamforming and postfiltering by means of maximum likelihood estimation. This iterative procedure is carried out on a frame-by-frame basis. The algorithm integrates the different estimates in a common statistical framework suitable for online scenarios. Moreover, our method can successfully exploit spectral, spatial and temporal speech properties. Our proposed algorithm is tested in different noisy environments using the multichannel recordings of the CHiME-4 database. The experimental results show that our method outperforms other related state-of-the-art approaches in noise reduction performance, while allowing low-latency processing for real-time applications.",

keywords = "Acoustics, Array signal processing, Computational modeling, Estimation, Kalman filter, Noise measurement, Recursive expectation-maximization, Speech enhancement, deep neural networks, multichannel speech enhancement, speech presence probability, recursive expectation-maximization, Deep neural networks",

author = "Mart{\'i}n-Do{\~n}as, {Juan M.} and Jesper Jensen and Zheng-Hua Tan and Angel Gomez and Antonio Peinado",

year = "2020",

month = dec,

doi = "10.1109/TASLP.2020.3036776",

language = "English",

volume = "28",

pages = "3080--3094",

journal = "IEEE/ACM Transactions on Audio, Speech, and Language Processing",

issn = "2329-9290",

publisher = "IEEE Signal Processing Society",

}

TY - JOUR

T1 - Online Multichannel Speech Enhancement Based on Recursive EM and DNN-based Speech Presence Estimation

AU - Martín-Doñas, Juan M.

AU - Jensen, Jesper

AU - Tan, Zheng-Hua

AU - Gomez, Angel

AU - Peinado, Antonio

PY - 2020/12

Y1 - 2020/12

N2 - This article presents a recursive expectation-maximization algorithm for online multichannel speech enhancement. A deep neural network mask estimator is used to compute the speech presence probability, which is then improved by means of statistical spatial models of the noisy speech and noise signals. The clean speech signal is estimated using beamforming, single-channel linear postfiltering and speech presence masking. The clean speech statistics and speech presence probabilities are finally used to compute the acoustic parameters for beamforming and postfiltering by means of maximum likelihood estimation. This iterative procedure is carried out on a frame-by-frame basis. The algorithm integrates the different estimates in a common statistical framework suitable for online scenarios. Moreover, our method can successfully exploit spectral, spatial and temporal speech properties. Our proposed algorithm is tested in different noisy environments using the multichannel recordings of the CHiME-4 database. The experimental results show that our method outperforms other related state-of-the-art approaches in noise reduction performance, while allowing low-latency processing for real-time applications.

AB - This article presents a recursive expectation-maximization algorithm for online multichannel speech enhancement. A deep neural network mask estimator is used to compute the speech presence probability, which is then improved by means of statistical spatial models of the noisy speech and noise signals. The clean speech signal is estimated using beamforming, single-channel linear postfiltering and speech presence masking. The clean speech statistics and speech presence probabilities are finally used to compute the acoustic parameters for beamforming and postfiltering by means of maximum likelihood estimation. This iterative procedure is carried out on a frame-by-frame basis. The algorithm integrates the different estimates in a common statistical framework suitable for online scenarios. Moreover, our method can successfully exploit spectral, spatial and temporal speech properties. Our proposed algorithm is tested in different noisy environments using the multichannel recordings of the CHiME-4 database. The experimental results show that our method outperforms other related state-of-the-art approaches in noise reduction performance, while allowing low-latency processing for real-time applications.

KW - Acoustics

KW - Array signal processing

KW - Computational modeling

KW - Estimation

KW - Kalman filter

KW - Noise measurement

KW - Recursive expectation-maximization

KW - Speech enhancement

KW - deep neural networks

KW - multichannel speech enhancement

KW - speech presence probability

KW - recursive expectation-maximization

KW - Deep neural networks

UR - http://www.scopus.com/inward/record.url?scp=85096859719&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2020.3036776

DO - 10.1109/TASLP.2020.3036776

M3 - Journal article

SN - 2329-9290

VL - 28

SP - 3080

EP - 3094

JO - IEEE/ACM Transactions on Audio, Speech, and Language Processing

JF - IEEE/ACM Transactions on Audio, Speech, and Language Processing

M1 - 9252844

ER -

Online Multichannel Speech Enhancement Based on Recursive EM and DNN-based Speech Presence Estimation

Abstract

Keywords

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this