A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

4 Citationer (Scopus)
77 Downloads (Pure)

Resumé

The presence of background noise in signals adversely affects the performance of many speech-based algorithms. Accurate estimation of signal-to-noise-ratio (SNR), as a measure of noise level in a signal, can help in compensating for noise effects. Most existing SNR estimation methods have been developed for normal speech and might not provide accurate estimation for special speech types such as whispered or disordered voices, particularly, when they are corrupted by non-stationary noises. In this paper, we first investigate the impact of stationary and non-stationary noise on the behavior of mel-frequency cepstral coefficients (MFCCs) extracted from normal, whispered and pathological voices. We demonstrate that, regardless of the speech type, the mean and the covariance of MFCCs are predictably modified by additive noise and the amount of change is related to the noise level. Then, we propose a new supervised method for SNR estimation which is based on a regression model trained on MFCCs of the noisy signals. Experimental results show that the proposed approach provides accurate estimation and consistent performance for various speech types under different noise conditions.
OriginalsprogEngelsk
Titel2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Antal sider5
ForlagIEEE
Publikationsdato10 sep. 2018
Sider296-300
Artikelnummer8462459
ISBN (Elektronisk)978-1-5386-4658-8
DOI
StatusUdgivet - 10 sep. 2018
Begivenhed2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Calgary, Canada
Varighed: 15 apr. 201820 apr. 2018
https://2018.ieeeicassp.org/

Konference

Konference2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
LandCanada
ByCalgary
Periode15/04/201820/04/2018
Internetadresse
NavnI E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings
ISSN1520-6149

Fingerprint

Signal to noise ratio
Additive noise

Citer dette

Poorjam, A. H., Little, M. A., Jensen, J. R., & Christensen, M. G. (2018). A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices. I 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (s. 296-300). [8462459] IEEE. I E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings https://doi.org/10.1109/ICASSP.2018.8462459
Poorjam, Amir Hossein ; Little, Max A ; Jensen, Jesper Rindom ; Christensen, Mads Græsbøll. / A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. s. 296-300 (I E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings).
@inproceedings{e0e5d796b29b4950b7f948b8042f6660,
title = "A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices",
abstract = "The presence of background noise in signals adversely affects the performance of many speech-based algorithms. Accurate estimation of signal-to-noise-ratio (SNR), as a measure of noise level in a signal, can help in compensating for noise effects. Most existing SNR estimation methods have been developed for normal speech and might not provide accurate estimation for special speech types such as whispered or disordered voices, particularly, when they are corrupted by non-stationary noises. In this paper, we first investigate the impact of stationary and non-stationary noise on the behavior of mel-frequency cepstral coefficients (MFCCs) extracted from normal, whispered and pathological voices. We demonstrate that, regardless of the speech type, the mean and the covariance of MFCCs are predictably modified by additive noise and the amount of change is related to the noise level. Then, we propose a new supervised method for SNR estimation which is based on a regression model trained on MFCCs of the noisy signals. Experimental results show that the proposed approach provides accurate estimation and consistent performance for various speech types under different noise conditions.",
keywords = "Global SNR estimation, MFCC, Pathological voice, Support vector regression, Whispered speech",
author = "Poorjam, {Amir Hossein} and Little, {Max A} and Jensen, {Jesper Rindom} and Christensen, {Mads Gr{\ae}sb{\o}ll}",
year = "2018",
month = "9",
day = "10",
doi = "10.1109/ICASSP.2018.8462459",
language = "English",
pages = "296--300",
booktitle = "2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)",
publisher = "IEEE",
address = "United States",

}

Poorjam, AH, Little, MA, Jensen, JR & Christensen, MG 2018, A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices. i 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)., 8462459, IEEE, I E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings, s. 296-300, Calgary, Canada, 15/04/2018. https://doi.org/10.1109/ICASSP.2018.8462459

A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices. / Poorjam, Amir Hossein; Little, Max A; Jensen, Jesper Rindom; Christensen, Mads Græsbøll.

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. s. 296-300 8462459 (I E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings).

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

TY - GEN

T1 - A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices

AU - Poorjam, Amir Hossein

AU - Little, Max A

AU - Jensen, Jesper Rindom

AU - Christensen, Mads Græsbøll

PY - 2018/9/10

Y1 - 2018/9/10

N2 - The presence of background noise in signals adversely affects the performance of many speech-based algorithms. Accurate estimation of signal-to-noise-ratio (SNR), as a measure of noise level in a signal, can help in compensating for noise effects. Most existing SNR estimation methods have been developed for normal speech and might not provide accurate estimation for special speech types such as whispered or disordered voices, particularly, when they are corrupted by non-stationary noises. In this paper, we first investigate the impact of stationary and non-stationary noise on the behavior of mel-frequency cepstral coefficients (MFCCs) extracted from normal, whispered and pathological voices. We demonstrate that, regardless of the speech type, the mean and the covariance of MFCCs are predictably modified by additive noise and the amount of change is related to the noise level. Then, we propose a new supervised method for SNR estimation which is based on a regression model trained on MFCCs of the noisy signals. Experimental results show that the proposed approach provides accurate estimation and consistent performance for various speech types under different noise conditions.

AB - The presence of background noise in signals adversely affects the performance of many speech-based algorithms. Accurate estimation of signal-to-noise-ratio (SNR), as a measure of noise level in a signal, can help in compensating for noise effects. Most existing SNR estimation methods have been developed for normal speech and might not provide accurate estimation for special speech types such as whispered or disordered voices, particularly, when they are corrupted by non-stationary noises. In this paper, we first investigate the impact of stationary and non-stationary noise on the behavior of mel-frequency cepstral coefficients (MFCCs) extracted from normal, whispered and pathological voices. We demonstrate that, regardless of the speech type, the mean and the covariance of MFCCs are predictably modified by additive noise and the amount of change is related to the noise level. Then, we propose a new supervised method for SNR estimation which is based on a regression model trained on MFCCs of the noisy signals. Experimental results show that the proposed approach provides accurate estimation and consistent performance for various speech types under different noise conditions.

KW - Global SNR estimation

KW - MFCC

KW - Pathological voice

KW - Support vector regression

KW - Whispered speech

UR - http://www.scopus.com/inward/record.url?scp=85054227441&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2018.8462459

DO - 10.1109/ICASSP.2018.8462459

M3 - Article in proceeding

SP - 296

EP - 300

BT - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

PB - IEEE

ER -

Poorjam AH, Little MA, Jensen JR, Christensen MG. A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices. I 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2018. s. 296-300. 8462459. (I E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings). https://doi.org/10.1109/ICASSP.2018.8462459