We compare two alternative speech intelligibility prediction algorithms: time-frequency glimpse proportion (GP) and spectro-temporal glimpsing index (STGI). Both algorithms hypothesize that listeners understand speech in challenging acoustic environments by “glimpsing” partially available information from degraded speech. GP defines glimpses as those time-frequency regions whose local signal-to-noise ratio is above a certain threshold and estimates intelligibility as the proportion of the time-frequency regions glimpsed. STGI, on the other hand, applies glimpsing to the spectro-temporal modulation (STM) domain and uses a similarity measure based on the normalized cross-correlation between the STM envelopes of the clean and degraded speech signals to estimate intelligibility as the proportion of the STM channels glimpsed. Our experimental results demonstrate that STGI extends the notion of glimpsing proportion to a wider range of distortions, including non-linear signal processing, and outperforms GP for the additive uncorrelated noise datasets we tested. Furthermore, the results show that spectro-temporal modulation analysis enables STGI to account for the effects of masker type on speech intelligibility, leading to superior performance over GP in modulated noise datasets.
Bibliographical noteFunding Information:
This work was partly (A.E. & W.-Y.C.) supported by the Natural Sciences and Engineering Research Council of Canada and the Demant Foundation. A portion of this work (D.F.) was also supported by the National Institutes of Health, National Institute on Deafness and Other Communication Disorders , Grant No. R01-DC015465 . The authors would like to thank the following researchers for providing intelligibility data: Carol Chermaz, Cees Taal, and Steven Van Kuyk.
© 2022 Elsevier B.V.
- Spectro-temporal modulation
- Speech intelligibility