A linear time-invariant filter is designed in order to improve speech understanding when the speech is played back in a noisy environment. To accomplish this, the speech intelligibility index (SII) is maximized under the constraint that the speech energy is held constant. A nonlinear approximation is used for the SII such that a closed-form solution exists to the constrained optimization problem. The resulting filter is dependent both on the long-term average noise and speech spectrum and the global SNR and, in general, has a high-pass characteristic. In contrast to existing methods, the proposed filter sets certain frequency bands to zero when they do not contribute to intelligibility anymore. Experiments show large intelligibility improvements with the proposed method when used in stationary speech-shaped noise. However, it was also found that the method does not perform well for speech corrupted by a competing speaker. This is due to the fact that the SII is not a reliable intelligibility predictor for fluctuating noise sources. MATLAB code is provided.
|Journal||Proceedings of the International Conference on Spoken Language Processing|
|Number of pages||6|
|Publication status||Published - 2013|
|Event||Interspeech 2013 - Lyon, France|
Duration: 25 Aug 2013 → 29 Aug 2013
|Period||25/08/2013 → 29/08/2013|