TY - JOUR
T1 - Data-Driven Non-Intrusive Speech Intelligibility Prediction using Speech Presence Probability
AU - Pedersen, Mathias
AU - Jensen, Søren Holdt
AU - Tan, Zheng-Hua
AU - Jensen, Jesper
PY - 2024
Y1 - 2024
N2 - Time consuming Speech Intelligibility (SI) listening tests with human subjects can be replaced by algorithmic SI predictors. In recent years, data-driven SI predictors have been showing promising results. A major limiting factor in the advancement of data-driven SI prediction is that there is a scarcity of SI listening test data available to train the data-driven methods. In this article we propose a data-driven SI predictor that does not require access to an underlying noise-free reference signal, i.e., non-intrusive, and which does not require listening test data for training. Instead, the proposed method exploits a hypothesized link between SI and Speech Presence Probability (SPP). We show that a neural network can be trained on easily obtainable speech in additive noise data to estimate SPP, and that a simple post-processing stage can be applied in order to map the estimated SPP to SI predictions with high accuracy. The proposed method is evaluated and compared to other state-of-the art non-intrusive SI predictors, and achieves the highest performance even in the presence of processed noisy speech, which the SPP estimator has not been trained on.
AB - Time consuming Speech Intelligibility (SI) listening tests with human subjects can be replaced by algorithmic SI predictors. In recent years, data-driven SI predictors have been showing promising results. A major limiting factor in the advancement of data-driven SI prediction is that there is a scarcity of SI listening test data available to train the data-driven methods. In this article we propose a data-driven SI predictor that does not require access to an underlying noise-free reference signal, i.e., non-intrusive, and which does not require listening test data for training. Instead, the proposed method exploits a hypothesized link between SI and Speech Presence Probability (SPP). We show that a neural network can be trained on easily obtainable speech in additive noise data to estimate SPP, and that a simple post-processing stage can be applied in order to map the estimated SPP to SI predictions with high accuracy. The proposed method is evaluated and compared to other state-of-the art non-intrusive SI predictors, and achieves the highest performance even in the presence of processed noisy speech, which the SPP estimator has not been trained on.
U2 - 10.1109/TASLP.2023.3321964
DO - 10.1109/TASLP.2023.3321964
M3 - Journal article
SN - 2329-9290
VL - 32
SP - 55
EP - 67
JO - IEEE/ACM Transactions on Audio, Speech, and Language Processing
JF - IEEE/ACM Transactions on Audio, Speech, and Language Processing
M1 - 10271546
ER -