Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks

Asger Heidemann Andersen, Jan Mark De Haan, Zheng-Hua Tan, Jesper Jensen

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

7 Downloads (Pure)

Resumé

Speech Intelligibility Prediction (SIP) algorithms are becoming popular tools within the development and operation of speech processing devices and algorithms. However, many SIP algorithms require knowledge of the underlying clean speech; a signal that is often not available in real-world applications. This has led to increased interest in nonintrusive SIP algorithms, which do not require clean speech to make predictions. In this paper, we investigate the use of Convolutional Neural Networks (CNNs) for nonintrusive SIP. To do so, we utilize a CNN architecture that shows similarities to existing SIP algorithms, in terms of computational structure, and which allows for easy and meaningful visualization and interpretation of trained weights. We evaluate this architecture using a large dataset obtained by combining datasets from the literature. The proposed method shows high prediction performance when compared with four existing intrusive and nonintrusive SIP algorithms. This demonstrates the potential of deep learning for speech intelligibility prediction.

OriginalsprogEngelsk
TidsskriftIEEE/ACM Transactions on Audio, Speech, and Language Processing
Vol/bind26
Udgave nummer10
Sider (fra-til)1925-1939
Antal sider15
ISSN2329-9290
DOI
StatusUdgivet - okt. 2018

Fingerprint

Speech intelligibility
intelligibility
Neural networks
predictions
Speech processing
performance prediction
Network architecture
learning
Visualization

Emneord

    Citer dette

    @article{eeb5d22fd8f24467b7724f1bb0ee1397,
    title = "Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks",
    abstract = "Speech Intelligibility Prediction (SIP) algorithms are becoming popular tools within the development and operation of speech processing devices and algorithms. However, many SIP algorithms require knowledge of the underlying clean speech; a signal that is often not available in real-world applications. This has led to increased interest in nonintrusive SIP algorithms, which do not require clean speech to make predictions. In this paper, we investigate the use of Convolutional Neural Networks (CNNs) for nonintrusive SIP. To do so, we utilize a CNN architecture that shows similarities to existing SIP algorithms, in terms of computational structure, and which allows for easy and meaningful visualization and interpretation of trained weights. We evaluate this architecture using a large dataset obtained by combining datasets from the literature. The proposed method shows high prediction performance when compared with four existing intrusive and nonintrusive SIP algorithms. This demonstrates the potential of deep learning for speech intelligibility prediction.",
    keywords = "Nonintrusive speech intelligibility prediction, convolutional neural networks",
    author = "{Heidemann Andersen}, Asger and Haan, {Jan Mark De} and Zheng-Hua Tan and Jesper Jensen",
    year = "2018",
    month = "10",
    doi = "10.1109/TASLP.2018.2847459",
    language = "English",
    volume = "26",
    pages = "1925--1939",
    journal = "IEEE/ACM Transactions on Audio, Speech, and Language Processing",
    issn = "2329-9290",
    publisher = "IEEE Signal Processing Society",
    number = "10",

    }

    Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks. / Heidemann Andersen, Asger; Haan, Jan Mark De; Tan, Zheng-Hua; Jensen, Jesper.

    I: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Bind 26, Nr. 10, 10.2018, s. 1925-1939.

    Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

    TY - JOUR

    T1 - Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks

    AU - Heidemann Andersen, Asger

    AU - Haan, Jan Mark De

    AU - Tan, Zheng-Hua

    AU - Jensen, Jesper

    PY - 2018/10

    Y1 - 2018/10

    N2 - Speech Intelligibility Prediction (SIP) algorithms are becoming popular tools within the development and operation of speech processing devices and algorithms. However, many SIP algorithms require knowledge of the underlying clean speech; a signal that is often not available in real-world applications. This has led to increased interest in nonintrusive SIP algorithms, which do not require clean speech to make predictions. In this paper, we investigate the use of Convolutional Neural Networks (CNNs) for nonintrusive SIP. To do so, we utilize a CNN architecture that shows similarities to existing SIP algorithms, in terms of computational structure, and which allows for easy and meaningful visualization and interpretation of trained weights. We evaluate this architecture using a large dataset obtained by combining datasets from the literature. The proposed method shows high prediction performance when compared with four existing intrusive and nonintrusive SIP algorithms. This demonstrates the potential of deep learning for speech intelligibility prediction.

    AB - Speech Intelligibility Prediction (SIP) algorithms are becoming popular tools within the development and operation of speech processing devices and algorithms. However, many SIP algorithms require knowledge of the underlying clean speech; a signal that is often not available in real-world applications. This has led to increased interest in nonintrusive SIP algorithms, which do not require clean speech to make predictions. In this paper, we investigate the use of Convolutional Neural Networks (CNNs) for nonintrusive SIP. To do so, we utilize a CNN architecture that shows similarities to existing SIP algorithms, in terms of computational structure, and which allows for easy and meaningful visualization and interpretation of trained weights. We evaluate this architecture using a large dataset obtained by combining datasets from the literature. The proposed method shows high prediction performance when compared with four existing intrusive and nonintrusive SIP algorithms. This demonstrates the potential of deep learning for speech intelligibility prediction.

    KW - Nonintrusive speech intelligibility prediction

    KW - convolutional neural networks

    UR - http://www.scopus.com/inward/record.url?scp=85049336756&partnerID=8YFLogxK

    U2 - 10.1109/TASLP.2018.2847459

    DO - 10.1109/TASLP.2018.2847459

    M3 - Journal article

    VL - 26

    SP - 1925

    EP - 1939

    JO - IEEE/ACM Transactions on Audio, Speech, and Language Processing

    JF - IEEE/ACM Transactions on Audio, Speech, and Language Processing

    SN - 2329-9290

    IS - 10

    ER -