TY - JOUR
T1 - Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks
AU - Heidemann Andersen, Asger
AU - Haan, Jan Mark De
AU - Tan, Zheng-Hua
AU - Jensen, Jesper
PY - 2018/10
Y1 - 2018/10
N2 - Speech Intelligibility Prediction (SIP) algorithms are becoming popular tools within the development and operation of speech processing devices and algorithms. However, many SIP algorithms require knowledge of the underlying clean speech; a signal that is often not available in real-world applications. This has led to increased interest in nonintrusive SIP algorithms, which do not require clean speech to make predictions. In this paper, we investigate the use of Convolutional Neural Networks (CNNs) for nonintrusive SIP. To do so, we utilize a CNN architecture that shows similarities to existing SIP algorithms, in terms of computational structure, and which allows for easy and meaningful visualization and interpretation of trained weights. We evaluate this architecture using a large dataset obtained by combining datasets from the literature. The proposed method shows high prediction performance when compared with four existing intrusive and nonintrusive SIP algorithms. This demonstrates the potential of deep learning for speech intelligibility prediction.
AB - Speech Intelligibility Prediction (SIP) algorithms are becoming popular tools within the development and operation of speech processing devices and algorithms. However, many SIP algorithms require knowledge of the underlying clean speech; a signal that is often not available in real-world applications. This has led to increased interest in nonintrusive SIP algorithms, which do not require clean speech to make predictions. In this paper, we investigate the use of Convolutional Neural Networks (CNNs) for nonintrusive SIP. To do so, we utilize a CNN architecture that shows similarities to existing SIP algorithms, in terms of computational structure, and which allows for easy and meaningful visualization and interpretation of trained weights. We evaluate this architecture using a large dataset obtained by combining datasets from the literature. The proposed method shows high prediction performance when compared with four existing intrusive and nonintrusive SIP algorithms. This demonstrates the potential of deep learning for speech intelligibility prediction.
KW - Nonintrusive speech intelligibility prediction
KW - convolutional neural networks
UR - http://www.scopus.com/inward/record.url?scp=85049336756&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2018.2847459
DO - 10.1109/TASLP.2018.2847459
M3 - Journal article
SN - 2329-9290
VL - 26
SP - 1925
EP - 1939
JO - IEEE/ACM Transactions on Audio, Speech, and Language Processing
JF - IEEE/ACM Transactions on Audio, Speech, and Language Processing
IS - 10
ER -