Spectro-temporal modulation glimpsing for speech intelligibility prediction

Amin Edraki; Wai Yip Chan; Jesper Jensen; Daniel Fogerty

doi:10.1016/j.heares.2022.108620

Spectro-temporal modulation glimpsing for speech intelligibility prediction

Amin Edraki^*, Wai Yip Chan, Jesper Jensen, Daniel Fogerty

^*Kontaktforfatter

Publikation: Bidrag til tidsskrift › Review (oversigtsartikel) › peer review

5 Citationer (Scopus)

Abstract

We compare two alternative speech intelligibility prediction algorithms: time-frequency glimpse proportion (GP) and spectro-temporal glimpsing index (STGI). Both algorithms hypothesize that listeners understand speech in challenging acoustic environments by “glimpsing” partially available information from degraded speech. GP defines glimpses as those time-frequency regions whose local signal-to-noise ratio is above a certain threshold and estimates intelligibility as the proportion of the time-frequency regions glimpsed. STGI, on the other hand, applies glimpsing to the spectro-temporal modulation (STM) domain and uses a similarity measure based on the normalized cross-correlation between the STM envelopes of the clean and degraded speech signals to estimate intelligibility as the proportion of the STM channels glimpsed. Our experimental results demonstrate that STGI extends the notion of glimpsing proportion to a wider range of distortions, including non-linear signal processing, and outperforms GP for the additive uncorrelated noise datasets we tested. Furthermore, the results show that spectro-temporal modulation analysis enables STGI to account for the effects of masker type on speech intelligibility, leading to superior performance over GP in modulated noise datasets.

Originalsprog	Engelsk
Artikelnummer	108620
Tidsskrift	Hearing Research
Vol/bind	426
ISSN	0378-5955
DOI	https://doi.org/10.1016/j.heares.2022.108620
Status	Udgivet - dec. 2022

Bibliografisk note

Publisher Copyright:
© 2022 Elsevier B.V.

Adgang til dokumentet

10.1016/j.heares.2022.108620

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10125146/pdf/nihms-1889684.pdf

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

@article{fa8cda3fc538453da8cdefd6895d5ba0,

title = "Spectro-temporal modulation glimpsing for speech intelligibility prediction",

abstract = "We compare two alternative speech intelligibility prediction algorithms: time-frequency glimpse proportion (GP) and spectro-temporal glimpsing index (STGI). Both algorithms hypothesize that listeners understand speech in challenging acoustic environments by “glimpsing” partially available information from degraded speech. GP defines glimpses as those time-frequency regions whose local signal-to-noise ratio is above a certain threshold and estimates intelligibility as the proportion of the time-frequency regions glimpsed. STGI, on the other hand, applies glimpsing to the spectro-temporal modulation (STM) domain and uses a similarity measure based on the normalized cross-correlation between the STM envelopes of the clean and degraded speech signals to estimate intelligibility as the proportion of the STM channels glimpsed. Our experimental results demonstrate that STGI extends the notion of glimpsing proportion to a wider range of distortions, including non-linear signal processing, and outperforms GP for the additive uncorrelated noise datasets we tested. Furthermore, the results show that spectro-temporal modulation analysis enables STGI to account for the effects of masker type on speech intelligibility, leading to superior performance over GP in modulated noise datasets.",

keywords = "Glimpsing, Spectro-temporal modulation, Speech intelligibility",

author = "Amin Edraki and Chan, {Wai Yip} and Jesper Jensen and Daniel Fogerty",

note = "Funding Information: This work was partly (A.E. & W.-Y.C.) supported by the Natural Sciences and Engineering Research Council of Canada and the Demant Foundation. A portion of this work (D.F.) was also supported by the National Institutes of Health, National Institute on Deafness and Other Communication Disorders , Grant No. R01-DC015465 . The authors would like to thank the following researchers for providing intelligibility data: Carol Chermaz, Cees Taal, and Steven Van Kuyk. Publisher Copyright: {\textcopyright} 2022 Elsevier B.V.",

year = "2022",

month = dec,

doi = "10.1016/j.heares.2022.108620",

language = "English",

volume = "426",

journal = "Hearing Research",

issn = "0378-5955",

publisher = "Elsevier",

}

TY - JOUR

T1 - Spectro-temporal modulation glimpsing for speech intelligibility prediction

AU - Edraki, Amin

AU - Chan, Wai Yip

AU - Jensen, Jesper

AU - Fogerty, Daniel

N1 - Funding Information: This work was partly (A.E. & W.-Y.C.) supported by the Natural Sciences and Engineering Research Council of Canada and the Demant Foundation. A portion of this work (D.F.) was also supported by the National Institutes of Health, National Institute on Deafness and Other Communication Disorders , Grant No. R01-DC015465 . The authors would like to thank the following researchers for providing intelligibility data: Carol Chermaz, Cees Taal, and Steven Van Kuyk. Publisher Copyright: © 2022 Elsevier B.V.

PY - 2022/12

Y1 - 2022/12

N2 - We compare two alternative speech intelligibility prediction algorithms: time-frequency glimpse proportion (GP) and spectro-temporal glimpsing index (STGI). Both algorithms hypothesize that listeners understand speech in challenging acoustic environments by “glimpsing” partially available information from degraded speech. GP defines glimpses as those time-frequency regions whose local signal-to-noise ratio is above a certain threshold and estimates intelligibility as the proportion of the time-frequency regions glimpsed. STGI, on the other hand, applies glimpsing to the spectro-temporal modulation (STM) domain and uses a similarity measure based on the normalized cross-correlation between the STM envelopes of the clean and degraded speech signals to estimate intelligibility as the proportion of the STM channels glimpsed. Our experimental results demonstrate that STGI extends the notion of glimpsing proportion to a wider range of distortions, including non-linear signal processing, and outperforms GP for the additive uncorrelated noise datasets we tested. Furthermore, the results show that spectro-temporal modulation analysis enables STGI to account for the effects of masker type on speech intelligibility, leading to superior performance over GP in modulated noise datasets.

AB - We compare two alternative speech intelligibility prediction algorithms: time-frequency glimpse proportion (GP) and spectro-temporal glimpsing index (STGI). Both algorithms hypothesize that listeners understand speech in challenging acoustic environments by “glimpsing” partially available information from degraded speech. GP defines glimpses as those time-frequency regions whose local signal-to-noise ratio is above a certain threshold and estimates intelligibility as the proportion of the time-frequency regions glimpsed. STGI, on the other hand, applies glimpsing to the spectro-temporal modulation (STM) domain and uses a similarity measure based on the normalized cross-correlation between the STM envelopes of the clean and degraded speech signals to estimate intelligibility as the proportion of the STM channels glimpsed. Our experimental results demonstrate that STGI extends the notion of glimpsing proportion to a wider range of distortions, including non-linear signal processing, and outperforms GP for the additive uncorrelated noise datasets we tested. Furthermore, the results show that spectro-temporal modulation analysis enables STGI to account for the effects of masker type on speech intelligibility, leading to superior performance over GP in modulated noise datasets.

KW - Glimpsing

KW - Spectro-temporal modulation

KW - Speech intelligibility

UR - http://www.scopus.com/inward/record.url?scp=85139072089&partnerID=8YFLogxK

U2 - 10.1016/j.heares.2022.108620

DO - 10.1016/j.heares.2022.108620

M3 - Review article

C2 - 36175300

AN - SCOPUS:85139072089

SN - 0378-5955

VL - 426

JO - Hearing Research

JF - Hearing Research

M1 - 108620

ER -

Spectro-temporal modulation glimpsing for speech intelligibility prediction

Abstract

Bibliografisk note

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater