Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks

Asger Heidemann Andersen; Jan Mark De Haan; Zheng-Hua Tan; Jesper Jensen

doi:10.1109/TASLP.2018.2847459

Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks

Asger Heidemann Andersen, Jan Mark De Haan, Zheng-Hua Tan, Jesper Jensen

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

42 Citationer (Scopus)

515 Downloads (Pure)

Abstract

Speech Intelligibility Prediction (SIP) algorithms are becoming popular tools within the development and operation of speech processing devices and algorithms. However, many SIP algorithms require knowledge of the underlying clean speech; a signal that is often not available in real-world applications. This has led to increased interest in nonintrusive SIP algorithms, which do not require clean speech to make predictions. In this paper, we investigate the use of Convolutional Neural Networks (CNNs) for nonintrusive SIP. To do so, we utilize a CNN architecture that shows similarities to existing SIP algorithms, in terms of computational structure, and which allows for easy and meaningful visualization and interpretation of trained weights. We evaluate this architecture using a large dataset obtained by combining datasets from the literature. The proposed method shows high prediction performance when compared with four existing intrusive and nonintrusive SIP algorithms. This demonstrates the potential of deep learning for speech intelligibility prediction.

Originalsprog	Engelsk
Tidsskrift	IEEE/ACM Transactions on Audio, Speech, and Language Processing
Vol/bind	26
Udgave nummer	10
Sider (fra-til)	1925-1939
Antal sider	15
ISSN	2329-9290
DOI	https://doi.org/10.1109/TASLP.2018.2847459
Status	Udgivet - okt. 2018

Adgang til dokumentet

10.1109/TASLP.2018.2847459

Green Open Access manuscriptAccepteret manuskript, 1,62 MB

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

http://www.scopus.com/inward/record.url?scp=85049336756&partnerID=8YFLogxK

Citationsformater

@article{eeb5d22fd8f24467b7724f1bb0ee1397,

title = "Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks",

abstract = "Speech Intelligibility Prediction (SIP) algorithms are becoming popular tools within the development and operation of speech processing devices and algorithms. However, many SIP algorithms require knowledge of the underlying clean speech; a signal that is often not available in real-world applications. This has led to increased interest in nonintrusive SIP algorithms, which do not require clean speech to make predictions. In this paper, we investigate the use of Convolutional Neural Networks (CNNs) for nonintrusive SIP. To do so, we utilize a CNN architecture that shows similarities to existing SIP algorithms, in terms of computational structure, and which allows for easy and meaningful visualization and interpretation of trained weights. We evaluate this architecture using a large dataset obtained by combining datasets from the literature. The proposed method shows high prediction performance when compared with four existing intrusive and nonintrusive SIP algorithms. This demonstrates the potential of deep learning for speech intelligibility prediction.",

keywords = "Nonintrusive speech intelligibility prediction, convolutional neural networks",

author = "{Heidemann Andersen}, Asger and Haan, {Jan Mark De} and Zheng-Hua Tan and Jesper Jensen",

year = "2018",

month = oct,

doi = "10.1109/TASLP.2018.2847459",

language = "English",

volume = "26",

pages = "1925--1939",

journal = "IEEE/ACM Transactions on Audio, Speech, and Language Processing",

issn = "2329-9290",

publisher = "IEEE Signal Processing Society",

number = "10",

}

TY - JOUR

T1 - Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks

AU - Heidemann Andersen, Asger

AU - Haan, Jan Mark De

AU - Tan, Zheng-Hua

AU - Jensen, Jesper

PY - 2018/10

Y1 - 2018/10

N2 - Speech Intelligibility Prediction (SIP) algorithms are becoming popular tools within the development and operation of speech processing devices and algorithms. However, many SIP algorithms require knowledge of the underlying clean speech; a signal that is often not available in real-world applications. This has led to increased interest in nonintrusive SIP algorithms, which do not require clean speech to make predictions. In this paper, we investigate the use of Convolutional Neural Networks (CNNs) for nonintrusive SIP. To do so, we utilize a CNN architecture that shows similarities to existing SIP algorithms, in terms of computational structure, and which allows for easy and meaningful visualization and interpretation of trained weights. We evaluate this architecture using a large dataset obtained by combining datasets from the literature. The proposed method shows high prediction performance when compared with four existing intrusive and nonintrusive SIP algorithms. This demonstrates the potential of deep learning for speech intelligibility prediction.

AB - Speech Intelligibility Prediction (SIP) algorithms are becoming popular tools within the development and operation of speech processing devices and algorithms. However, many SIP algorithms require knowledge of the underlying clean speech; a signal that is often not available in real-world applications. This has led to increased interest in nonintrusive SIP algorithms, which do not require clean speech to make predictions. In this paper, we investigate the use of Convolutional Neural Networks (CNNs) for nonintrusive SIP. To do so, we utilize a CNN architecture that shows similarities to existing SIP algorithms, in terms of computational structure, and which allows for easy and meaningful visualization and interpretation of trained weights. We evaluate this architecture using a large dataset obtained by combining datasets from the literature. The proposed method shows high prediction performance when compared with four existing intrusive and nonintrusive SIP algorithms. This demonstrates the potential of deep learning for speech intelligibility prediction.

KW - Nonintrusive speech intelligibility prediction

KW - convolutional neural networks

UR - http://www.scopus.com/inward/record.url?scp=85049336756&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2018.2847459

DO - 10.1109/TASLP.2018.2847459

M3 - Journal article

SN - 2329-9290

VL - 26

SP - 1925

EP - 1939

JO - IEEE/ACM Transactions on Audio, Speech, and Language Processing

JF - IEEE/ACM Transactions on Audio, Speech, and Language Processing

IS - 10

ER -

Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks

Abstract

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater