Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis

Amin Edraki, Wai Yip Geoffrey Chan, Jesper Jensen, Daniel Fogerty

Research output: Contribution to journalJournal articleResearchpeer-review

14 Citations (Scopus)
152 Downloads (Pure)

Abstract

Spectro-temporal modulations are believed to mediate the analysis of speech sounds in the human primary auditory cortex. Inspired by humans' robustness in comprehending speech in challenging acoustic environments, we propose an intrusive speech intelligibility prediction (SIP) algorithm, wSTMI, for normal-hearing listeners based on spectro-temporal modulation analysis (STMA) of the clean and degraded speech signals. In the STMA, each of 55 modulation frequency channels contributes an intermediate intelligibility measure. A sparse linear model with parameters optimized using Lasso regression results in combining the intermediate measures of 8 of the most salient channels for SIP. In comparison with a suite of 10 SIP algorithms, wSTMI performs consistently well across 13 datasets, which together cover degradation conditions including modulated noise, noise reduction processing, reverberation, near-end listening enhancement, and speech interruption. We show that the optimized parameters of wSTMI may be interpreted in terms of modulation transfer functions of the human auditory system. Thus, the proposed approach offers evidence affirming previous studies of the perceptual characteristics underlying speech signal intelligibility.

Original languageEnglish
Article number9269417
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume29
Pages (from-to)210-225
Number of pages16
ISSN2329-9290
DOIs
Publication statusPublished - Dec 2020

Keywords

  • spectro-temporal modulation
  • speech intelligibility
  • speech quality model

Fingerprint

Dive into the research topics of 'Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis'. Together they form a unique fingerprint.

Cite this