Performance evaluation of the short-time objective intelligibility measure with different band importance functions

Asger Heidemann Andersen; Jan Mark de Haan; Zheng-Hua Tan; Jesper Jensen

Performance evaluation of the short-time objective intelligibility measure with different band importance functions

Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen

Publikation: Konferencebidrag uden forlag/tidsskrift › Poster › Forskning

Abstract

Methods for speech intelligibility prediction are quickly becoming popular tools within the speech processing community. Such methods can easily and objectively estimate the effectiveness of different speech enhancement schemes. The short-time objective intelligibility (STOI) measure has enjoyed particular popularity due to its simplicity and its proven ability to provide accurate predictions across a wide range of conditions. The STOI measure has a simple structure which is similar to many other intelligibility measures: 1) clean and degraded speech signals are split into one-third octave bands with a filter bank, 2) envelopes are extracted from each band, 3) the temporal correlation between clean and degraded envelopes is computed in short time segments, and 4) the correlation is averaged across time and frequency bands to obtain the final output. An unusual choice in the design of the STOI measure, is that all frequency bands are equally weighted in the final measure. This is in contrast to classical methods such as the speech intelligibility index (SII) which employs empirically determined band importance functions (BIFs), specifying the relative contribution of each frequency band to intelligibility.

In this study we investigated the use of BIFs in the STOI measure. BIFs were fitted to several datasets of measured intelligibility. This was done such as to minimize the root-mean-squared prediction error. We then performed a cross-evaluation of the obtained BIFs on all datasets, using three different performance measures: root-mean-squared-error, Pearson correlation, and Kendall rank correlation. The results show substantially improved performance when fitting and evaluating on the same dataset. However, this advantage does not necessarily subsist when fitting and evaluating on different datasets. When there are big differences between the datasets used for fitting and evaluating, poor performance may result. In contrast, the uniform BIF used in the original STOI measure leads to decent performance across all datasets. We therefore conclude that, while prediction performance of the STOI measure can be improved considerably under some conditions by the use of fitted BIFs, this should be done with caution.

Originalsprog	Engelsk
Publikationsdato	5 jan. 2017
Status	Udgivet - 5 jan. 2017
Begivenhed	The 9th Speech in Noise Workshop - University of Oldenburg, Oldenburg, Tyskland Varighed: 5 jan. 2017 → 6 jan. 2017 http://spin2017.de/

Workshop

Workshop	The 9th Speech in Noise Workshop
Lokation	University of Oldenburg
Land/Område	Tyskland
By	Oldenburg
Periode	05/01/2017 → 06/01/2017
Internetadresse	http://spin2017.de/

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

http://spin2017.de/?p=program&id=29

Citationsformater

@conference{21cece063e1d4531a616d9a1ac0656dc,

title = "Performance evaluation of the short-time objective intelligibility measure with different band importance functions",

abstract = "Methods for speech intelligibility prediction are quickly becoming popular tools within the speech processing community. Such methods can easily and objectively estimate the effectiveness of different speech enhancement schemes. The short-time objective intelligibility (STOI) measure has enjoyed particular popularity due to its simplicity and its proven ability to provide accurate predictions across a wide range of conditions. The STOI measure has a simple structure which is similar to many other intelligibility measures: 1) clean and degraded speech signals are split into one-third octave bands with a filter bank, 2) envelopes are extracted from each band, 3) the temporal correlation between clean and degraded envelopes is computed in short time segments, and 4) the correlation is averaged across time and frequency bands to obtain the final output. An unusual choice in the design of the STOI measure, is that all frequency bands are equally weighted in the final measure. This is in contrast to classical methods such as the speech intelligibility index (SII) which employs empirically determined band importance functions (BIFs), specifying the relative contribution of each frequency band to intelligibility. In this study we investigated the use of BIFs in the STOI measure. BIFs were fitted to several datasets of measured intelligibility. This was done such as to minimize the root-mean-squared prediction error. We then performed a cross-evaluation of the obtained BIFs on all datasets, using three different performance measures: root-mean-squared-error, Pearson correlation, and Kendall rank correlation. The results show substantially improved performance when fitting and evaluating on the same dataset. However, this advantage does not necessarily subsist when fitting and evaluating on different datasets. When there are big differences between the datasets used for fitting and evaluating, poor performance may result. In contrast, the uniform BIF used in the original STOI measure leads to decent performance across all datasets. We therefore conclude that, while prediction performance of the STOI measure can be improved considerably under some conditions by the use of fitted BIFs, this should be done with caution.",

author = "{Heidemann Andersen}, Asger and {de Haan}, {Jan Mark} and Zheng-Hua Tan and Jesper Jensen",

year = "2017",

month = jan,

day = "5",

language = "English",

note = "The 9th Speech in Noise Workshop ; Conference date: 05-01-2017 Through 06-01-2017",

url = "http://spin2017.de/",

}

TY - CONF

T1 - Performance evaluation of the short-time objective intelligibility measure with different band importance functions

AU - Heidemann Andersen, Asger

AU - de Haan, Jan Mark

AU - Tan, Zheng-Hua

AU - Jensen, Jesper

PY - 2017/1/5

Y1 - 2017/1/5

N2 - Methods for speech intelligibility prediction are quickly becoming popular tools within the speech processing community. Such methods can easily and objectively estimate the effectiveness of different speech enhancement schemes. The short-time objective intelligibility (STOI) measure has enjoyed particular popularity due to its simplicity and its proven ability to provide accurate predictions across a wide range of conditions. The STOI measure has a simple structure which is similar to many other intelligibility measures: 1) clean and degraded speech signals are split into one-third octave bands with a filter bank, 2) envelopes are extracted from each band, 3) the temporal correlation between clean and degraded envelopes is computed in short time segments, and 4) the correlation is averaged across time and frequency bands to obtain the final output. An unusual choice in the design of the STOI measure, is that all frequency bands are equally weighted in the final measure. This is in contrast to classical methods such as the speech intelligibility index (SII) which employs empirically determined band importance functions (BIFs), specifying the relative contribution of each frequency band to intelligibility. In this study we investigated the use of BIFs in the STOI measure. BIFs were fitted to several datasets of measured intelligibility. This was done such as to minimize the root-mean-squared prediction error. We then performed a cross-evaluation of the obtained BIFs on all datasets, using three different performance measures: root-mean-squared-error, Pearson correlation, and Kendall rank correlation. The results show substantially improved performance when fitting and evaluating on the same dataset. However, this advantage does not necessarily subsist when fitting and evaluating on different datasets. When there are big differences between the datasets used for fitting and evaluating, poor performance may result. In contrast, the uniform BIF used in the original STOI measure leads to decent performance across all datasets. We therefore conclude that, while prediction performance of the STOI measure can be improved considerably under some conditions by the use of fitted BIFs, this should be done with caution.

AB - Methods for speech intelligibility prediction are quickly becoming popular tools within the speech processing community. Such methods can easily and objectively estimate the effectiveness of different speech enhancement schemes. The short-time objective intelligibility (STOI) measure has enjoyed particular popularity due to its simplicity and its proven ability to provide accurate predictions across a wide range of conditions. The STOI measure has a simple structure which is similar to many other intelligibility measures: 1) clean and degraded speech signals are split into one-third octave bands with a filter bank, 2) envelopes are extracted from each band, 3) the temporal correlation between clean and degraded envelopes is computed in short time segments, and 4) the correlation is averaged across time and frequency bands to obtain the final output. An unusual choice in the design of the STOI measure, is that all frequency bands are equally weighted in the final measure. This is in contrast to classical methods such as the speech intelligibility index (SII) which employs empirically determined band importance functions (BIFs), specifying the relative contribution of each frequency band to intelligibility. In this study we investigated the use of BIFs in the STOI measure. BIFs were fitted to several datasets of measured intelligibility. This was done such as to minimize the root-mean-squared prediction error. We then performed a cross-evaluation of the obtained BIFs on all datasets, using three different performance measures: root-mean-squared-error, Pearson correlation, and Kendall rank correlation. The results show substantially improved performance when fitting and evaluating on the same dataset. However, this advantage does not necessarily subsist when fitting and evaluating on different datasets. When there are big differences between the datasets used for fitting and evaluating, poor performance may result. In contrast, the uniform BIF used in the original STOI measure leads to decent performance across all datasets. We therefore conclude that, while prediction performance of the STOI measure can be improved considerably under some conditions by the use of fitted BIFs, this should be done with caution.

UR - http://spin2017.de/?p=program&id=29

M3 - Poster

T2 - The 9th Speech in Noise Workshop

Y2 - 5 January 2017 through 6 January 2017

ER -