End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks

Mathias Pedersen; Morten Kolbæk; Asger Heidemann Andersen; Søren Holdt Jensen; Jesper Jensen

doi:10.21437/Interspeech.2020-1740

End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks

Mathias Pedersen^*, Morten Kolbæk, Asger Heidemann Andersen, Søren Holdt Jensen, Jesper Jensen

^*Corresponding author for this work

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

10 Citations (Scopus)

123 Downloads (Pure)

Abstract

Data-driven speech intelligibility prediction has been slow totake off. Datasets of measured speech intelligibility are scarce,and so current models are relatively small and rely on hand-picked features. Classical predictors based on psychoacousticmodels and heuristics are still the state-of-the-art. This workproposes a U-Net inspired fully convolutional neural networkarchitecture, NSIP, trained and tested on ten datasets to pre-dict intelligibility of time-domain speech. The architecture iscompared to a frequency domain data-driven predictor and tothe classical state-of-the-art predictors STOI, ESTOI, HASPIand SIIB. The performance of NSIP is found to be superior fordatasets seen in the training phase. On unseen datasets NSIPreaches performance comparable to classical predictors.

Original language	English
Title of host publication	INTERSPEECH 2020
Number of pages	5
Publication date	2020
Pages	1151-1155
DOIs	https://doi.org/10.21437/Interspeech.2020-1740
Publication status	Published - 2020
Event	Interspeech 2020 - Shanghai, China Duration: 25 Oct 2020 → 29 Oct 2020

Conference

Conference	Interspeech 2020
Country/Territory	China
City	Shanghai
Period	25/10/2020 → 29/10/2020

Series	Proceedings of the International Conference on Spoken Language Processing
ISSN	1990-9772

Keywords

speech intelligibility prediction
fully convolutional neural networks
deep Learning

Access to Document

10.21437/Interspeech.2020-1740Licence: Unspecified

Open Access ArticleFinal published version, 233 KBLicence: Unspecified

AUB Link

Search for the material in Aalborg University Library's search engine

10 Citations
1 PhD thesis

Data-Driven Speech Intelligibility Prediction
Pedersen, M., 2023, Aalborg Universitetsforlag. 114 p.
Research output: PhD thesis

Open Access
File

123 Downloads (Pure)

Cite this

@inproceedings{34806093aeca46db8133f49202c5d6a9,

title = "End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks",

abstract = "Data-driven speech intelligibility prediction has been slow totake off. Datasets of measured speech intelligibility are scarce,and so current models are relatively small and rely on hand-picked features. Classical predictors based on psychoacousticmodels and heuristics are still the state-of-the-art. This workproposes a U-Net inspired fully convolutional neural networkarchitecture, NSIP, trained and tested on ten datasets to pre-dict intelligibility of time-domain speech. The architecture iscompared to a frequency domain data-driven predictor and tothe classical state-of-the-art predictors STOI, ESTOI, HASPIand SIIB. The performance of NSIP is found to be superior fordatasets seen in the training phase. On unseen datasets NSIPreaches performance comparable to classical predictors.",

keywords = "Taleforst{\aa}elighed, foldningsnetv{\ae}rk, Deep Learning, speech intelligibility prediction, fully convolutional neural networks, deep Learning",

author = "Mathias Pedersen and Morten Kolb{\ae}k and Andersen, {Asger Heidemann} and Jensen, {S{\o}ren Holdt} and Jesper Jensen",

year = "2020",

doi = "10.21437/Interspeech.2020-1740",

language = "English",

series = "Proceedings of the International Conference on Spoken Language Processing",

publisher = "International Speech Communication Association",

pages = "1151--1155",

booktitle = "INTERSPEECH 2020",

note = "Interspeech 2020 ; Conference date: 25-10-2020 Through 29-10-2020",

}

Pedersen, M, Kolbæk, M, Andersen, AH, Jensen, SH & Jensen, J 2020, End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks. in INTERSPEECH 2020. Proceedings of the International Conference on Spoken Language Processing, pp. 1151-1155, Interspeech 2020, Shanghai, China, 25/10/2020. https://doi.org/10.21437/Interspeech.2020-1740

End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks. / Pedersen, Mathias; Kolbæk, Morten; Andersen, Asger Heidemann et al.
INTERSPEECH 2020. 2020. p. 1151-1155 (Proceedings of the International Conference on Spoken Language Processing).

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

TY - GEN

T1 - End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks

AU - Pedersen, Mathias

AU - Kolbæk, Morten

AU - Andersen, Asger Heidemann

AU - Jensen, Søren Holdt

AU - Jensen, Jesper

PY - 2020

Y1 - 2020

N2 - Data-driven speech intelligibility prediction has been slow totake off. Datasets of measured speech intelligibility are scarce,and so current models are relatively small and rely on hand-picked features. Classical predictors based on psychoacousticmodels and heuristics are still the state-of-the-art. This workproposes a U-Net inspired fully convolutional neural networkarchitecture, NSIP, trained and tested on ten datasets to pre-dict intelligibility of time-domain speech. The architecture iscompared to a frequency domain data-driven predictor and tothe classical state-of-the-art predictors STOI, ESTOI, HASPIand SIIB. The performance of NSIP is found to be superior fordatasets seen in the training phase. On unseen datasets NSIPreaches performance comparable to classical predictors.

AB - Data-driven speech intelligibility prediction has been slow totake off. Datasets of measured speech intelligibility are scarce,and so current models are relatively small and rely on hand-picked features. Classical predictors based on psychoacousticmodels and heuristics are still the state-of-the-art. This workproposes a U-Net inspired fully convolutional neural networkarchitecture, NSIP, trained and tested on ten datasets to pre-dict intelligibility of time-domain speech. The architecture iscompared to a frequency domain data-driven predictor and tothe classical state-of-the-art predictors STOI, ESTOI, HASPIand SIIB. The performance of NSIP is found to be superior fordatasets seen in the training phase. On unseen datasets NSIPreaches performance comparable to classical predictors.

KW - Taleforståelighed

KW - foldningsnetværk

KW - Deep Learning

KW - speech intelligibility prediction

KW - fully convolutional neural networks

KW - deep Learning

U2 - 10.21437/Interspeech.2020-1740

DO - 10.21437/Interspeech.2020-1740

M3 - Article in proceeding

T3 - Proceedings of the International Conference on Spoken Language Processing

SP - 1151

EP - 1155

BT - INTERSPEECH 2020

T2 - Interspeech 2020

Y2 - 25 October 2020 through 29 October 2020

ER -

End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks

Abstract

Conference

Keywords

Access to Document

AUB Link

Fingerprint

Research output

Data-Driven Speech Intelligibility Prediction

Cite this