End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks

Mathias Pedersen*, Morten Kolbæk, Asger Heidemann Andersen, Søren Holdt Jensen, Jesper Jensen

*Corresponding author for this work

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

10 Citations (Scopus)
123 Downloads (Pure)

Abstract

Data-driven speech intelligibility prediction has been slow totake off. Datasets of measured speech intelligibility are scarce,and so current models are relatively small and rely on hand-picked features. Classical predictors based on psychoacousticmodels and heuristics are still the state-of-the-art. This workproposes a U-Net inspired fully convolutional neural networkarchitecture, NSIP, trained and tested on ten datasets to pre-dict intelligibility of time-domain speech. The architecture iscompared to a frequency domain data-driven predictor and tothe classical state-of-the-art predictors STOI, ESTOI, HASPIand SIIB. The performance of NSIP is found to be superior fordatasets seen in the training phase. On unseen datasets NSIPreaches performance comparable to classical predictors.
Original languageEnglish
Title of host publicationINTERSPEECH 2020
Number of pages5
Publication date2020
Pages1151-1155
DOIs
Publication statusPublished - 2020
EventInterspeech 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Conference

ConferenceInterspeech 2020
Country/TerritoryChina
CityShanghai
Period25/10/202029/10/2020
SeriesProceedings of the International Conference on Spoken Language Processing
ISSN1990-9772

Keywords

  • speech intelligibility prediction
  • fully convolutional neural networks
  • deep Learning

Fingerprint

Dive into the research topics of 'End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks'. Together they form a unique fingerprint.

Cite this