Performance evaluation of the short-time objective intelligibility measure with different band importance functions

Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen

Research output: Contribution to conference without publisher/journalPosterResearch

Abstract

Methods for speech intelligibility prediction are quickly becoming popular tools within the speech processing community. Such methods can easily and objectively estimate the effectiveness of different speech enhancement schemes. The short-time objective intelligibility (STOI) measure has enjoyed particular popularity due to its simplicity and its proven ability to provide accurate predictions across a wide range of conditions. The STOI measure has a simple structure which is similar to many other intelligibility measures: 1) clean and degraded speech signals are split into one-third octave bands with a filter bank, 2) envelopes are extracted from each band, 3) the temporal correlation between clean and degraded envelopes is computed in short time segments, and 4) the correlation is averaged across time and frequency bands to obtain the final output. An unusual choice in the design of the STOI measure, is that all frequency bands are equally weighted in the final measure. This is in contrast to classical methods such as the speech intelligibility index (SII) which employs empirically determined band importance functions (BIFs), specifying the relative contribution of each frequency band to intelligibility.

In this study we investigated the use of BIFs in the STOI measure. BIFs were fitted to several datasets of measured intelligibility. This was done such as to minimize the root-mean-squared prediction error. We then performed a cross-evaluation of the obtained BIFs on all datasets, using three different performance measures: root-mean-squared-error, Pearson correlation, and Kendall rank correlation. The results show substantially improved performance when fitting and evaluating on the same dataset. However, this advantage does not necessarily subsist when fitting and evaluating on different datasets. When there are big differences between the datasets used for fitting and evaluating, poor performance may result. In contrast, the uniform BIF used in the original STOI measure leads to decent performance across all datasets. We therefore conclude that, while prediction performance of the STOI measure can be improved considerably under some conditions by the use of fitted BIFs, this should be done with caution.
Original languageEnglish
Publication date5 Jan 2017
Publication statusPublished - 5 Jan 2017
EventThe 9th Speech in Noise Workshop - University of Oldenburg, Oldenburg, Germany
Duration: 5 Jan 20176 Jan 2017
http://spin2017.de/

Workshop

WorkshopThe 9th Speech in Noise Workshop
LocationUniversity of Oldenburg
CountryGermany
CityOldenburg
Period05/01/201706/01/2017
Internet address

Cite this

Heidemann Andersen, A., de Haan, J. M., Tan, Z-H., & Jensen, J. (2017). Performance evaluation of the short-time objective intelligibility measure with different band importance functions. Poster session presented at The 9th Speech in Noise Workshop, Oldenburg, Germany.
Heidemann Andersen, Asger ; de Haan, Jan Mark ; Tan, Zheng-Hua ; Jensen, Jesper. / Performance evaluation of the short-time objective intelligibility measure with different band importance functions. Poster session presented at The 9th Speech in Noise Workshop, Oldenburg, Germany.
@conference{21cece063e1d4531a616d9a1ac0656dc,
title = "Performance evaluation of the short-time objective intelligibility measure with different band importance functions",
abstract = "Methods for speech intelligibility prediction are quickly becoming popular tools within the speech processing community. Such methods can easily and objectively estimate the effectiveness of different speech enhancement schemes. The short-time objective intelligibility (STOI) measure has enjoyed particular popularity due to its simplicity and its proven ability to provide accurate predictions across a wide range of conditions. The STOI measure has a simple structure which is similar to many other intelligibility measures: 1) clean and degraded speech signals are split into one-third octave bands with a filter bank, 2) envelopes are extracted from each band, 3) the temporal correlation between clean and degraded envelopes is computed in short time segments, and 4) the correlation is averaged across time and frequency bands to obtain the final output. An unusual choice in the design of the STOI measure, is that all frequency bands are equally weighted in the final measure. This is in contrast to classical methods such as the speech intelligibility index (SII) which employs empirically determined band importance functions (BIFs), specifying the relative contribution of each frequency band to intelligibility. In this study we investigated the use of BIFs in the STOI measure. BIFs were fitted to several datasets of measured intelligibility. This was done such as to minimize the root-mean-squared prediction error. We then performed a cross-evaluation of the obtained BIFs on all datasets, using three different performance measures: root-mean-squared-error, Pearson correlation, and Kendall rank correlation. The results show substantially improved performance when fitting and evaluating on the same dataset. However, this advantage does not necessarily subsist when fitting and evaluating on different datasets. When there are big differences between the datasets used for fitting and evaluating, poor performance may result. In contrast, the uniform BIF used in the original STOI measure leads to decent performance across all datasets. We therefore conclude that, while prediction performance of the STOI measure can be improved considerably under some conditions by the use of fitted BIFs, this should be done with caution.",
author = "{Heidemann Andersen}, Asger and {de Haan}, {Jan Mark} and Zheng-Hua Tan and Jesper Jensen",
year = "2017",
month = "1",
day = "5",
language = "English",
note = "null ; Conference date: 05-01-2017 Through 06-01-2017",
url = "http://spin2017.de/",

}

Performance evaluation of the short-time objective intelligibility measure with different band importance functions. / Heidemann Andersen, Asger; de Haan, Jan Mark; Tan, Zheng-Hua; Jensen, Jesper.

2017. Poster session presented at The 9th Speech in Noise Workshop, Oldenburg, Germany.

Research output: Contribution to conference without publisher/journalPosterResearch

TY - CONF

T1 - Performance evaluation of the short-time objective intelligibility measure with different band importance functions

AU - Heidemann Andersen, Asger

AU - de Haan, Jan Mark

AU - Tan, Zheng-Hua

AU - Jensen, Jesper

PY - 2017/1/5

Y1 - 2017/1/5

N2 - Methods for speech intelligibility prediction are quickly becoming popular tools within the speech processing community. Such methods can easily and objectively estimate the effectiveness of different speech enhancement schemes. The short-time objective intelligibility (STOI) measure has enjoyed particular popularity due to its simplicity and its proven ability to provide accurate predictions across a wide range of conditions. The STOI measure has a simple structure which is similar to many other intelligibility measures: 1) clean and degraded speech signals are split into one-third octave bands with a filter bank, 2) envelopes are extracted from each band, 3) the temporal correlation between clean and degraded envelopes is computed in short time segments, and 4) the correlation is averaged across time and frequency bands to obtain the final output. An unusual choice in the design of the STOI measure, is that all frequency bands are equally weighted in the final measure. This is in contrast to classical methods such as the speech intelligibility index (SII) which employs empirically determined band importance functions (BIFs), specifying the relative contribution of each frequency band to intelligibility. In this study we investigated the use of BIFs in the STOI measure. BIFs were fitted to several datasets of measured intelligibility. This was done such as to minimize the root-mean-squared prediction error. We then performed a cross-evaluation of the obtained BIFs on all datasets, using three different performance measures: root-mean-squared-error, Pearson correlation, and Kendall rank correlation. The results show substantially improved performance when fitting and evaluating on the same dataset. However, this advantage does not necessarily subsist when fitting and evaluating on different datasets. When there are big differences between the datasets used for fitting and evaluating, poor performance may result. In contrast, the uniform BIF used in the original STOI measure leads to decent performance across all datasets. We therefore conclude that, while prediction performance of the STOI measure can be improved considerably under some conditions by the use of fitted BIFs, this should be done with caution.

AB - Methods for speech intelligibility prediction are quickly becoming popular tools within the speech processing community. Such methods can easily and objectively estimate the effectiveness of different speech enhancement schemes. The short-time objective intelligibility (STOI) measure has enjoyed particular popularity due to its simplicity and its proven ability to provide accurate predictions across a wide range of conditions. The STOI measure has a simple structure which is similar to many other intelligibility measures: 1) clean and degraded speech signals are split into one-third octave bands with a filter bank, 2) envelopes are extracted from each band, 3) the temporal correlation between clean and degraded envelopes is computed in short time segments, and 4) the correlation is averaged across time and frequency bands to obtain the final output. An unusual choice in the design of the STOI measure, is that all frequency bands are equally weighted in the final measure. This is in contrast to classical methods such as the speech intelligibility index (SII) which employs empirically determined band importance functions (BIFs), specifying the relative contribution of each frequency band to intelligibility. In this study we investigated the use of BIFs in the STOI measure. BIFs were fitted to several datasets of measured intelligibility. This was done such as to minimize the root-mean-squared prediction error. We then performed a cross-evaluation of the obtained BIFs on all datasets, using three different performance measures: root-mean-squared-error, Pearson correlation, and Kendall rank correlation. The results show substantially improved performance when fitting and evaluating on the same dataset. However, this advantage does not necessarily subsist when fitting and evaluating on different datasets. When there are big differences between the datasets used for fitting and evaluating, poor performance may result. In contrast, the uniform BIF used in the original STOI measure leads to decent performance across all datasets. We therefore conclude that, while prediction performance of the STOI measure can be improved considerably under some conditions by the use of fitted BIFs, this should be done with caution.

UR - http://spin2017.de/?p=program&id=29

M3 - Poster

ER -

Heidemann Andersen A, de Haan JM, Tan Z-H, Jensen J. Performance evaluation of the short-time objective intelligibility measure with different band importance functions. 2017. Poster session presented at The 9th Speech in Noise Workshop, Oldenburg, Germany.