Data-driven speech intelligibility prediction has been slow totake off. Datasets of measured speech intelligibility are scarce,and so current models are relatively small and rely on hand-picked features. Classical predictors based on psychoacousticmodels and heuristics are still the state-of-the-art. This workproposes a U-Net inspired fully convolutional neural networkarchitecture, NSIP, trained and tested on ten datasets to pre-dict intelligibility of time-domain speech. The architecture iscompared to a frequency domain data-driven predictor and tothe classical state-of-the-art predictors STOI, ESTOI, HASPIand SIIB. The performance of NSIP is found to be superior fordatasets seen in the training phase. On unseen datasets NSIPreaches performance comparable to classical predictors.
|Status||Accepteret/In press - 26 jul. 2020|
- Deep Learning
Pedersen, M., Kolbæk, M., Andersen, A. H., Jensen, S. H., & Jensen, J. (Accepteret/In press). End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks. I INTERSPEECH 2020