End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks

Mathias Pedersen*, Morten Kolbæk, Asger Heidemann Andersen, Søren Holdt Jensen, Jesper Jensen

*Kontaktforfatter

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

Abstrakt

Data-driven speech intelligibility prediction has been slow totake off. Datasets of measured speech intelligibility are scarce,and so current models are relatively small and rely on hand-picked features. Classical predictors based on psychoacousticmodels and heuristics are still the state-of-the-art. This workproposes a U-Net inspired fully convolutional neural networkarchitecture, NSIP, trained and tested on ten datasets to pre-dict intelligibility of time-domain speech. The architecture iscompared to a frequency domain data-driven predictor and tothe classical state-of-the-art predictors STOI, ESTOI, HASPIand SIIB. The performance of NSIP is found to be superior fordatasets seen in the training phase. On unseen datasets NSIPreaches performance comparable to classical predictors.
OriginalsprogEngelsk
TitelINTERSPEECH 2020
Antal sider4
StatusAccepteret/In press - 26 jul. 2020

Emneord

  • Taleforståelighed
  • foldningsnetværk
  • Deep Learning

Fingeraftryk Dyk ned i forskningsemnerne om 'End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks'. Sammen danner de et unikt fingeraftryk.

  • Citationsformater