Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis

Amin Edraki; Wai Yip Geoffrey Chan; Jesper Jensen; Daniel Fogerty

doi:10.1109/TASLP.2020.3039929

Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis

Amin Edraki, Wai Yip Geoffrey Chan, Jesper Jensen, Daniel Fogerty

Research output: Contribution to journal › Journal article › Research › peer-review

14 Citations (Scopus)

152 Downloads (Pure)

Abstract

Spectro-temporal modulations are believed to mediate the analysis of speech sounds in the human primary auditory cortex. Inspired by humans' robustness in comprehending speech in challenging acoustic environments, we propose an intrusive speech intelligibility prediction (SIP) algorithm, wSTMI, for normal-hearing listeners based on spectro-temporal modulation analysis (STMA) of the clean and degraded speech signals. In the STMA, each of 55 modulation frequency channels contributes an intermediate intelligibility measure. A sparse linear model with parameters optimized using Lasso regression results in combining the intermediate measures of 8 of the most salient channels for SIP. In comparison with a suite of 10 SIP algorithms, wSTMI performs consistently well across 13 datasets, which together cover degradation conditions including modulated noise, noise reduction processing, reverberation, near-end listening enhancement, and speech interruption. We show that the optimized parameters of wSTMI may be interpreted in terms of modulation transfer functions of the human auditory system. Thus, the proposed approach offers evidence affirming previous studies of the perceptual characteristics underlying speech signal intelligibility.

Original language	English
Article number	9269417
Journal	IEEE/ACM Transactions on Audio Speech and Language Processing
Volume	29
Pages (from-to)	210-225
Number of pages	16
ISSN	2329-9290
DOIs	https://doi.org/10.1109/TASLP.2020.3039929
Publication status	Published - Dec 2020

Keywords

spectro-temporal modulation
speech intelligibility
speech quality model

Access to Document

10.1109/TASLP.2020.3039929

Accepted manuscriptAccepted author manuscript, 3.41 MB

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{79fe7b8d553b464095f9b9dff59cfbf6,

title = "Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis",

abstract = "Spectro-temporal modulations are believed to mediate the analysis of speech sounds in the human primary auditory cortex. Inspired by humans' robustness in comprehending speech in challenging acoustic environments, we propose an intrusive speech intelligibility prediction (SIP) algorithm, wSTMI, for normal-hearing listeners based on spectro-temporal modulation analysis (STMA) of the clean and degraded speech signals. In the STMA, each of 55 modulation frequency channels contributes an intermediate intelligibility measure. A sparse linear model with parameters optimized using Lasso regression results in combining the intermediate measures of 8 of the most salient channels for SIP. In comparison with a suite of 10 SIP algorithms, wSTMI performs consistently well across 13 datasets, which together cover degradation conditions including modulated noise, noise reduction processing, reverberation, near-end listening enhancement, and speech interruption. We show that the optimized parameters of wSTMI may be interpreted in terms of modulation transfer functions of the human auditory system. Thus, the proposed approach offers evidence affirming previous studies of the perceptual characteristics underlying speech signal intelligibility.",

keywords = "spectro-temporal modulation, speech intelligibility, speech quality model",

author = "Amin Edraki and Chan, {Wai Yip Geoffrey} and Jesper Jensen and Daniel Fogerty",

year = "2020",

month = dec,

doi = "10.1109/TASLP.2020.3039929",

language = "English",

volume = "29",

pages = "210--225",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

publisher = "IEEE Signal Processing Society",

}

TY - JOUR

T1 - Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis

AU - Edraki, Amin

AU - Chan, Wai Yip Geoffrey

AU - Jensen, Jesper

AU - Fogerty, Daniel

PY - 2020/12

Y1 - 2020/12

N2 - Spectro-temporal modulations are believed to mediate the analysis of speech sounds in the human primary auditory cortex. Inspired by humans' robustness in comprehending speech in challenging acoustic environments, we propose an intrusive speech intelligibility prediction (SIP) algorithm, wSTMI, for normal-hearing listeners based on spectro-temporal modulation analysis (STMA) of the clean and degraded speech signals. In the STMA, each of 55 modulation frequency channels contributes an intermediate intelligibility measure. A sparse linear model with parameters optimized using Lasso regression results in combining the intermediate measures of 8 of the most salient channels for SIP. In comparison with a suite of 10 SIP algorithms, wSTMI performs consistently well across 13 datasets, which together cover degradation conditions including modulated noise, noise reduction processing, reverberation, near-end listening enhancement, and speech interruption. We show that the optimized parameters of wSTMI may be interpreted in terms of modulation transfer functions of the human auditory system. Thus, the proposed approach offers evidence affirming previous studies of the perceptual characteristics underlying speech signal intelligibility.

AB - Spectro-temporal modulations are believed to mediate the analysis of speech sounds in the human primary auditory cortex. Inspired by humans' robustness in comprehending speech in challenging acoustic environments, we propose an intrusive speech intelligibility prediction (SIP) algorithm, wSTMI, for normal-hearing listeners based on spectro-temporal modulation analysis (STMA) of the clean and degraded speech signals. In the STMA, each of 55 modulation frequency channels contributes an intermediate intelligibility measure. A sparse linear model with parameters optimized using Lasso regression results in combining the intermediate measures of 8 of the most salient channels for SIP. In comparison with a suite of 10 SIP algorithms, wSTMI performs consistently well across 13 datasets, which together cover degradation conditions including modulated noise, noise reduction processing, reverberation, near-end listening enhancement, and speech interruption. We show that the optimized parameters of wSTMI may be interpreted in terms of modulation transfer functions of the human auditory system. Thus, the proposed approach offers evidence affirming previous studies of the perceptual characteristics underlying speech signal intelligibility.

KW - spectro-temporal modulation

KW - speech intelligibility

KW - speech quality model

UR - http://www.scopus.com/inward/record.url?scp=85097126929&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2020.3039929

DO - 10.1109/TASLP.2020.3039929

M3 - Journal article

AN - SCOPUS:85097126929

SN - 2329-9290

VL - 29

SP - 210

EP - 225

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

M1 - 9269417

ER -

Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis

Abstract

Keywords

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this