TY - GEN
T1 - A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices
AU - Poorjam, Amir Hossein
AU - Little, Max A
AU - Jensen, Jesper Rindom
AU - Christensen, Mads Græsbøll
PY - 2018/9/10
Y1 - 2018/9/10
N2 - The presence of background noise in signals adversely affects the performance of many speech-based algorithms. Accurate estimation of signal-to-noise-ratio (SNR), as a measure of noise level in a signal, can help in compensating for noise effects. Most existing SNR estimation methods have been developed for normal speech and might not provide accurate estimation for special speech types such as whispered or disordered voices, particularly, when they are corrupted by non-stationary noises. In this paper, we first investigate the impact of stationary and non-stationary noise on the behavior of mel-frequency cepstral coefficients (MFCCs) extracted from normal, whispered and pathological voices. We demonstrate that, regardless of the speech type, the mean and the covariance of MFCCs are predictably modified by additive noise and the amount of change is related to the noise level. Then, we propose a new supervised method for SNR estimation which is based on a regression model trained on MFCCs of the noisy signals. Experimental results show that the proposed approach provides accurate estimation and consistent performance for various speech types under different noise conditions.
AB - The presence of background noise in signals adversely affects the performance of many speech-based algorithms. Accurate estimation of signal-to-noise-ratio (SNR), as a measure of noise level in a signal, can help in compensating for noise effects. Most existing SNR estimation methods have been developed for normal speech and might not provide accurate estimation for special speech types such as whispered or disordered voices, particularly, when they are corrupted by non-stationary noises. In this paper, we first investigate the impact of stationary and non-stationary noise on the behavior of mel-frequency cepstral coefficients (MFCCs) extracted from normal, whispered and pathological voices. We demonstrate that, regardless of the speech type, the mean and the covariance of MFCCs are predictably modified by additive noise and the amount of change is related to the noise level. Then, we propose a new supervised method for SNR estimation which is based on a regression model trained on MFCCs of the noisy signals. Experimental results show that the proposed approach provides accurate estimation and consistent performance for various speech types under different noise conditions.
KW - Global SNR estimation
KW - MFCC
KW - Pathological voice
KW - Support vector regression
KW - Whispered speech
UR - http://www.scopus.com/inward/record.url?scp=85054227441&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2018.8462459
DO - 10.1109/ICASSP.2018.8462459
M3 - Article in proceeding
T3 - I E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings
SP - 296
EP - 300
BT - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PB - IEEE
T2 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Y2 - 15 April 2018 through 20 April 2018
ER -