TY - JOUR
T1 - Online Multichannel Speech Enhancement Based on Recursive EM and DNN-based Speech Presence Estimation
AU - Martín-Doñas, Juan M.
AU - Jensen, Jesper
AU - Tan, Zheng-Hua
AU - Gomez, Angel
AU - Peinado, Antonio
PY - 2020/12
Y1 - 2020/12
N2 - This article presents a recursive expectation-maximization algorithm for online multichannel speech enhancement. A deep neural network mask estimator is used to compute the speech presence probability, which is then improved by means of statistical spatial models of the noisy speech and noise signals. The clean speech signal is estimated using beamforming, single-channel linear postfiltering and speech presence masking. The clean speech statistics and speech presence probabilities are finally used to compute the acoustic parameters for beamforming and postfiltering by means of maximum likelihood estimation. This iterative procedure is carried out on a frame-by-frame basis. The algorithm integrates the different estimates in a common statistical framework suitable for online scenarios. Moreover, our method can successfully exploit spectral, spatial and temporal speech properties. Our proposed algorithm is tested in different noisy environments using the multichannel recordings of the CHiME-4 database. The experimental results show that our method outperforms other related state-of-the-art approaches in noise reduction performance, while allowing low-latency processing for real-time applications.
AB - This article presents a recursive expectation-maximization algorithm for online multichannel speech enhancement. A deep neural network mask estimator is used to compute the speech presence probability, which is then improved by means of statistical spatial models of the noisy speech and noise signals. The clean speech signal is estimated using beamforming, single-channel linear postfiltering and speech presence masking. The clean speech statistics and speech presence probabilities are finally used to compute the acoustic parameters for beamforming and postfiltering by means of maximum likelihood estimation. This iterative procedure is carried out on a frame-by-frame basis. The algorithm integrates the different estimates in a common statistical framework suitable for online scenarios. Moreover, our method can successfully exploit spectral, spatial and temporal speech properties. Our proposed algorithm is tested in different noisy environments using the multichannel recordings of the CHiME-4 database. The experimental results show that our method outperforms other related state-of-the-art approaches in noise reduction performance, while allowing low-latency processing for real-time applications.
KW - Acoustics
KW - Array signal processing
KW - Computational modeling
KW - Estimation
KW - Kalman filter
KW - Noise measurement
KW - Recursive expectation-maximization
KW - Speech enhancement
KW - deep neural networks
KW - multichannel speech enhancement
KW - speech presence probability
KW - recursive expectation-maximization
KW - Deep neural networks
UR - http://www.scopus.com/inward/record.url?scp=85096859719&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2020.3036776
DO - 10.1109/TASLP.2020.3036776
M3 - Journal article
SN - 2329-9290
VL - 28
SP - 3080
EP - 3094
JO - IEEE/ACM Transactions on Audio, Speech, and Language Processing
JF - IEEE/ACM Transactions on Audio, Speech, and Language Processing
M1 - 9252844
ER -