Spectro-temporal modulation glimpsing for speech intelligibility prediction

Amin Edraki*, Wai Yip Chan, Jesper Jensen, Daniel Fogerty

*Kontaktforfatter

Publikation: Bidrag til tidsskriftReview (oversigtsartikel)peer review

5 Citationer (Scopus)

Abstract

We compare two alternative speech intelligibility prediction algorithms: time-frequency glimpse proportion (GP) and spectro-temporal glimpsing index (STGI). Both algorithms hypothesize that listeners understand speech in challenging acoustic environments by “glimpsing” partially available information from degraded speech. GP defines glimpses as those time-frequency regions whose local signal-to-noise ratio is above a certain threshold and estimates intelligibility as the proportion of the time-frequency regions glimpsed. STGI, on the other hand, applies glimpsing to the spectro-temporal modulation (STM) domain and uses a similarity measure based on the normalized cross-correlation between the STM envelopes of the clean and degraded speech signals to estimate intelligibility as the proportion of the STM channels glimpsed. Our experimental results demonstrate that STGI extends the notion of glimpsing proportion to a wider range of distortions, including non-linear signal processing, and outperforms GP for the additive uncorrelated noise datasets we tested. Furthermore, the results show that spectro-temporal modulation analysis enables STGI to account for the effects of masker type on speech intelligibility, leading to superior performance over GP in modulated noise datasets.

OriginalsprogEngelsk
Artikelnummer108620
TidsskriftHearing Research
Vol/bind426
ISSN0378-5955
DOI
StatusUdgivet - dec. 2022

Bibliografisk note

Publisher Copyright:
© 2022 Elsevier B.V.

Fingeraftryk

Dyk ned i forskningsemnerne om 'Spectro-temporal modulation glimpsing for speech intelligibility prediction'. Sammen danner de et unikt fingeraftryk.

Citationsformater