Online Multichannel Speech Enhancement Based on Recursive EM and DNN-based Speech Presence Estimation

Juan M. Martín-Doñas; Jesper Jensen; Zheng-Hua Tan; Angel Gomez; Antonio Peinado

doi:10.1109/TASLP.2020.3036776

Online Multichannel Speech Enhancement Based on Recursive EM and DNN-based Speech Presence Estimation

Juan M. Martín-Doñas, Jesper Jensen, Zheng-Hua Tan, Angel Gomez, Antonio Peinado

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

11 Citationer (Scopus)

214 Downloads (Pure)

Abstract

This article presents a recursive expectation-maximization algorithm for online multichannel speech enhancement. A deep neural network mask estimator is used to compute the speech presence probability, which is then improved by means of statistical spatial models of the noisy speech and noise signals. The clean speech signal is estimated using beamforming, single-channel linear postfiltering and speech presence masking. The clean speech statistics and speech presence probabilities are finally used to compute the acoustic parameters for beamforming and postfiltering by means of maximum likelihood estimation. This iterative procedure is carried out on a frame-by-frame basis. The algorithm integrates the different estimates in a common statistical framework suitable for online scenarios. Moreover, our method can successfully exploit spectral, spatial and temporal speech properties. Our proposed algorithm is tested in different noisy environments using the multichannel recordings of the CHiME-4 database. The experimental results show that our method outperforms other related state-of-the-art approaches in noise reduction performance, while allowing low-latency processing for real-time applications.

Originalsprog	Engelsk
Artikelnummer	9252844
Tidsskrift	IEEE/ACM Transactions on Audio, Speech, and Language Processing
Vol/bind	28
Sider (fra-til)	3080-3094
Antal sider	15
ISSN	2329-9290
DOI	https://doi.org/10.1109/TASLP.2020.3036776
Status	Udgivet - dec. 2020

Adgang til dokumentet

10.1109/TASLP.2020.3036776

Accepted manuscriptAccepteret manuskript, 1,87 MB

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

@article{1357755664334324a69c7167ca6b7e2a,

title = "Online Multichannel Speech Enhancement Based on Recursive EM and DNN-based Speech Presence Estimation",

abstract = "This article presents a recursive expectation-maximization algorithm for online multichannel speech enhancement. A deep neural network mask estimator is used to compute the speech presence probability, which is then improved by means of statistical spatial models of the noisy speech and noise signals. The clean speech signal is estimated using beamforming, single-channel linear postfiltering and speech presence masking. The clean speech statistics and speech presence probabilities are finally used to compute the acoustic parameters for beamforming and postfiltering by means of maximum likelihood estimation. This iterative procedure is carried out on a frame-by-frame basis. The algorithm integrates the different estimates in a common statistical framework suitable for online scenarios. Moreover, our method can successfully exploit spectral, spatial and temporal speech properties. Our proposed algorithm is tested in different noisy environments using the multichannel recordings of the CHiME-4 database. The experimental results show that our method outperforms other related state-of-the-art approaches in noise reduction performance, while allowing low-latency processing for real-time applications.",

keywords = "Acoustics, Array signal processing, Computational modeling, Estimation, Kalman filter, Noise measurement, Recursive expectation-maximization, Speech enhancement, deep neural networks, multichannel speech enhancement, speech presence probability, recursive expectation-maximization, Deep neural networks",

author = "Mart{\'i}n-Do{\~n}as, {Juan M.} and Jesper Jensen and Zheng-Hua Tan and Angel Gomez and Antonio Peinado",

year = "2020",

month = dec,

doi = "10.1109/TASLP.2020.3036776",

language = "English",

volume = "28",

pages = "3080--3094",

journal = "IEEE/ACM Transactions on Audio, Speech, and Language Processing",

issn = "2329-9290",

publisher = "IEEE Signal Processing Society",

}

TY - JOUR

T1 - Online Multichannel Speech Enhancement Based on Recursive EM and DNN-based Speech Presence Estimation

AU - Martín-Doñas, Juan M.

AU - Jensen, Jesper

AU - Tan, Zheng-Hua

AU - Gomez, Angel

AU - Peinado, Antonio

PY - 2020/12

Y1 - 2020/12

N2 - This article presents a recursive expectation-maximization algorithm for online multichannel speech enhancement. A deep neural network mask estimator is used to compute the speech presence probability, which is then improved by means of statistical spatial models of the noisy speech and noise signals. The clean speech signal is estimated using beamforming, single-channel linear postfiltering and speech presence masking. The clean speech statistics and speech presence probabilities are finally used to compute the acoustic parameters for beamforming and postfiltering by means of maximum likelihood estimation. This iterative procedure is carried out on a frame-by-frame basis. The algorithm integrates the different estimates in a common statistical framework suitable for online scenarios. Moreover, our method can successfully exploit spectral, spatial and temporal speech properties. Our proposed algorithm is tested in different noisy environments using the multichannel recordings of the CHiME-4 database. The experimental results show that our method outperforms other related state-of-the-art approaches in noise reduction performance, while allowing low-latency processing for real-time applications.

AB - This article presents a recursive expectation-maximization algorithm for online multichannel speech enhancement. A deep neural network mask estimator is used to compute the speech presence probability, which is then improved by means of statistical spatial models of the noisy speech and noise signals. The clean speech signal is estimated using beamforming, single-channel linear postfiltering and speech presence masking. The clean speech statistics and speech presence probabilities are finally used to compute the acoustic parameters for beamforming and postfiltering by means of maximum likelihood estimation. This iterative procedure is carried out on a frame-by-frame basis. The algorithm integrates the different estimates in a common statistical framework suitable for online scenarios. Moreover, our method can successfully exploit spectral, spatial and temporal speech properties. Our proposed algorithm is tested in different noisy environments using the multichannel recordings of the CHiME-4 database. The experimental results show that our method outperforms other related state-of-the-art approaches in noise reduction performance, while allowing low-latency processing for real-time applications.

KW - Acoustics

KW - Array signal processing

KW - Computational modeling

KW - Estimation

KW - Kalman filter

KW - Noise measurement

KW - Recursive expectation-maximization

KW - Speech enhancement

KW - deep neural networks

KW - multichannel speech enhancement

KW - speech presence probability

KW - recursive expectation-maximization

KW - Deep neural networks

UR - http://www.scopus.com/inward/record.url?scp=85096859719&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2020.3036776

DO - 10.1109/TASLP.2020.3036776

M3 - Journal article

SN - 2329-9290

VL - 28

SP - 3080

EP - 3094

JO - IEEE/ACM Transactions on Audio, Speech, and Language Processing

JF - IEEE/ACM Transactions on Audio, Speech, and Language Processing

M1 - 9252844

ER -

Online Multichannel Speech Enhancement Based on Recursive EM and DNN-based Speech Presence Estimation

Abstract

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater