Performance evaluation of the short-time objective intelligibility measure with different band importance functions

Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen

Publikation: Konferencebidrag uden forlag/tidsskriftPosterForskning

Resumé

Methods for speech intelligibility prediction are quickly becoming popular tools within the speech processing community. Such methods can easily and objectively estimate the effectiveness of different speech enhancement schemes. The short-time objective intelligibility (STOI) measure has enjoyed particular popularity due to its simplicity and its proven ability to provide accurate predictions across a wide range of conditions. The STOI measure has a simple structure which is similar to many other intelligibility measures: 1) clean and degraded speech signals are split into one-third octave bands with a filter bank, 2) envelopes are extracted from each band, 3) the temporal correlation between clean and degraded envelopes is computed in short time segments, and 4) the correlation is averaged across time and frequency bands to obtain the final output. An unusual choice in the design of the STOI measure, is that all frequency bands are equally weighted in the final measure. This is in contrast to classical methods such as the speech intelligibility index (SII) which employs empirically determined band importance functions (BIFs), specifying the relative contribution of each frequency band to intelligibility.

In this study we investigated the use of BIFs in the STOI measure. BIFs were fitted to several datasets of measured intelligibility. This was done such as to minimize the root-mean-squared prediction error. We then performed a cross-evaluation of the obtained BIFs on all datasets, using three different performance measures: root-mean-squared-error, Pearson correlation, and Kendall rank correlation. The results show substantially improved performance when fitting and evaluating on the same dataset. However, this advantage does not necessarily subsist when fitting and evaluating on different datasets. When there are big differences between the datasets used for fitting and evaluating, poor performance may result. In contrast, the uniform BIF used in the original STOI measure leads to decent performance across all datasets. We therefore conclude that, while prediction performance of the STOI measure can be improved considerably under some conditions by the use of fitted BIFs, this should be done with caution.
OriginalsprogEngelsk
Publikationsdato5 jan. 2017
StatusUdgivet - 5 jan. 2017
BegivenhedThe 9th Speech in Noise Workshop - University of Oldenburg, Oldenburg, Tyskland
Varighed: 5 jan. 20176 jan. 2017
http://spin2017.de/

Workshop

WorkshopThe 9th Speech in Noise Workshop
LokationUniversity of Oldenburg
LandTyskland
ByOldenburg
Periode05/01/201706/01/2017
Internetadresse

Citer dette

Heidemann Andersen, A., de Haan, J. M., Tan, Z-H., & Jensen, J. (2017). Performance evaluation of the short-time objective intelligibility measure with different band importance functions. Poster præsenteret på The 9th Speech in Noise Workshop, Oldenburg, Tyskland.
Heidemann Andersen, Asger ; de Haan, Jan Mark ; Tan, Zheng-Hua ; Jensen, Jesper. / Performance evaluation of the short-time objective intelligibility measure with different band importance functions. Poster præsenteret på The 9th Speech in Noise Workshop, Oldenburg, Tyskland.
@conference{21cece063e1d4531a616d9a1ac0656dc,
title = "Performance evaluation of the short-time objective intelligibility measure with different band importance functions",
abstract = "Methods for speech intelligibility prediction are quickly becoming popular tools within the speech processing community. Such methods can easily and objectively estimate the effectiveness of different speech enhancement schemes. The short-time objective intelligibility (STOI) measure has enjoyed particular popularity due to its simplicity and its proven ability to provide accurate predictions across a wide range of conditions. The STOI measure has a simple structure which is similar to many other intelligibility measures: 1) clean and degraded speech signals are split into one-third octave bands with a filter bank, 2) envelopes are extracted from each band, 3) the temporal correlation between clean and degraded envelopes is computed in short time segments, and 4) the correlation is averaged across time and frequency bands to obtain the final output. An unusual choice in the design of the STOI measure, is that all frequency bands are equally weighted in the final measure. This is in contrast to classical methods such as the speech intelligibility index (SII) which employs empirically determined band importance functions (BIFs), specifying the relative contribution of each frequency band to intelligibility. In this study we investigated the use of BIFs in the STOI measure. BIFs were fitted to several datasets of measured intelligibility. This was done such as to minimize the root-mean-squared prediction error. We then performed a cross-evaluation of the obtained BIFs on all datasets, using three different performance measures: root-mean-squared-error, Pearson correlation, and Kendall rank correlation. The results show substantially improved performance when fitting and evaluating on the same dataset. However, this advantage does not necessarily subsist when fitting and evaluating on different datasets. When there are big differences between the datasets used for fitting and evaluating, poor performance may result. In contrast, the uniform BIF used in the original STOI measure leads to decent performance across all datasets. We therefore conclude that, while prediction performance of the STOI measure can be improved considerably under some conditions by the use of fitted BIFs, this should be done with caution.",
author = "{Heidemann Andersen}, Asger and {de Haan}, {Jan Mark} and Zheng-Hua Tan and Jesper Jensen",
year = "2017",
month = "1",
day = "5",
language = "English",
note = "null ; Conference date: 05-01-2017 Through 06-01-2017",
url = "http://spin2017.de/",

}

Heidemann Andersen, A, de Haan, JM, Tan, Z-H & Jensen, J 2017, 'Performance evaluation of the short-time objective intelligibility measure with different band importance functions' The 9th Speech in Noise Workshop, Oldenburg, Tyskland, 05/01/2017 - 06/01/2017, .

Performance evaluation of the short-time objective intelligibility measure with different band importance functions. / Heidemann Andersen, Asger; de Haan, Jan Mark; Tan, Zheng-Hua; Jensen, Jesper.

2017. Poster præsenteret på The 9th Speech in Noise Workshop, Oldenburg, Tyskland.

Publikation: Konferencebidrag uden forlag/tidsskriftPosterForskning

TY - CONF

T1 - Performance evaluation of the short-time objective intelligibility measure with different band importance functions

AU - Heidemann Andersen, Asger

AU - de Haan, Jan Mark

AU - Tan, Zheng-Hua

AU - Jensen, Jesper

PY - 2017/1/5

Y1 - 2017/1/5

N2 - Methods for speech intelligibility prediction are quickly becoming popular tools within the speech processing community. Such methods can easily and objectively estimate the effectiveness of different speech enhancement schemes. The short-time objective intelligibility (STOI) measure has enjoyed particular popularity due to its simplicity and its proven ability to provide accurate predictions across a wide range of conditions. The STOI measure has a simple structure which is similar to many other intelligibility measures: 1) clean and degraded speech signals are split into one-third octave bands with a filter bank, 2) envelopes are extracted from each band, 3) the temporal correlation between clean and degraded envelopes is computed in short time segments, and 4) the correlation is averaged across time and frequency bands to obtain the final output. An unusual choice in the design of the STOI measure, is that all frequency bands are equally weighted in the final measure. This is in contrast to classical methods such as the speech intelligibility index (SII) which employs empirically determined band importance functions (BIFs), specifying the relative contribution of each frequency band to intelligibility. In this study we investigated the use of BIFs in the STOI measure. BIFs were fitted to several datasets of measured intelligibility. This was done such as to minimize the root-mean-squared prediction error. We then performed a cross-evaluation of the obtained BIFs on all datasets, using three different performance measures: root-mean-squared-error, Pearson correlation, and Kendall rank correlation. The results show substantially improved performance when fitting and evaluating on the same dataset. However, this advantage does not necessarily subsist when fitting and evaluating on different datasets. When there are big differences between the datasets used for fitting and evaluating, poor performance may result. In contrast, the uniform BIF used in the original STOI measure leads to decent performance across all datasets. We therefore conclude that, while prediction performance of the STOI measure can be improved considerably under some conditions by the use of fitted BIFs, this should be done with caution.

AB - Methods for speech intelligibility prediction are quickly becoming popular tools within the speech processing community. Such methods can easily and objectively estimate the effectiveness of different speech enhancement schemes. The short-time objective intelligibility (STOI) measure has enjoyed particular popularity due to its simplicity and its proven ability to provide accurate predictions across a wide range of conditions. The STOI measure has a simple structure which is similar to many other intelligibility measures: 1) clean and degraded speech signals are split into one-third octave bands with a filter bank, 2) envelopes are extracted from each band, 3) the temporal correlation between clean and degraded envelopes is computed in short time segments, and 4) the correlation is averaged across time and frequency bands to obtain the final output. An unusual choice in the design of the STOI measure, is that all frequency bands are equally weighted in the final measure. This is in contrast to classical methods such as the speech intelligibility index (SII) which employs empirically determined band importance functions (BIFs), specifying the relative contribution of each frequency band to intelligibility. In this study we investigated the use of BIFs in the STOI measure. BIFs were fitted to several datasets of measured intelligibility. This was done such as to minimize the root-mean-squared prediction error. We then performed a cross-evaluation of the obtained BIFs on all datasets, using three different performance measures: root-mean-squared-error, Pearson correlation, and Kendall rank correlation. The results show substantially improved performance when fitting and evaluating on the same dataset. However, this advantage does not necessarily subsist when fitting and evaluating on different datasets. When there are big differences between the datasets used for fitting and evaluating, poor performance may result. In contrast, the uniform BIF used in the original STOI measure leads to decent performance across all datasets. We therefore conclude that, while prediction performance of the STOI measure can be improved considerably under some conditions by the use of fitted BIFs, this should be done with caution.

UR - http://spin2017.de/?p=program&id=29

M3 - Poster

ER -

Heidemann Andersen A, de Haan JM, Tan Z-H, Jensen J. Performance evaluation of the short-time objective intelligibility measure with different band importance functions. 2017. Poster præsenteret på The 9th Speech in Noise Workshop, Oldenburg, Tyskland.