Frame Selection for Robust Speaker Identification: A Hybrid Approach

Swati Prasad; Zheng Hua Tan; Ramjee Prasad

doi:10.1007/s11277-017-4544-1

Frame Selection for Robust Speaker Identification: A Hybrid Approach

Swati Prasad^*, Zheng Hua Tan, Ramjee Prasad

^*Kontaktforfatter

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

4 Citationer (Scopus)

Abstract

Identification of a person using voice is a challenging task under environmental noises. Important and reliable frame selection for feature extraction from the time-domain speech signal under noise can play a significant role in improving speaker identification accuracy. Therefore, this paper presents a frame selection method using hybrid technique, which combines two techniques, namely, voice activity detection (VAD) and variable frame rate (VFR) analysis. It efficiently captures the active speech part, the changes in the temporal characteristics of the speech signal, taking into account the signal-to-noise ratio, and thereby speaker-specific information. Experimental results on noisy speech, generated by artificially adding various noise signals to the clean YOHO speech at different SNRs have shown improved results for the frame selection by the hybrid technique in comparison with any one of the techniques used for the hybrid. The proposed hybrid technique outperformed both the VFR and the widely used Gaussian statistical model based VAD method for all noise scenarios at different SNRs, except for the Babble noise corrupted speech at 5 dB SNR, for which, VFR performed better. Considering the average identification accuracies of different noise scenarios, a relative improvement of 9.79% over the VFR, and 18.05% over the Gaussian statistical model based VAD method has been achieved.

Originalsprog	Engelsk
Tidsskrift	Wireless Personal Communications
Vol/bind	97
Udgave nummer	1
Sider (fra-til)	933-950
Antal sider	18
ISSN	0929-6212
DOI	https://doi.org/10.1007/s11277-017-4544-1
Status	Udgivet - nov. 2017

Adgang til dokumentet

10.1007/s11277-017-4544-1

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

http://www.scopus.com/inward/record.url?scp=85019683611&partnerID=8YFLogxK

Citationsformater

@article{88559c1d7c334add8ddf9735bac5cbfd,

title = "Frame Selection for Robust Speaker Identification: A Hybrid Approach",

abstract = "Identification of a person using voice is a challenging task under environmental noises. Important and reliable frame selection for feature extraction from the time-domain speech signal under noise can play a significant role in improving speaker identification accuracy. Therefore, this paper presents a frame selection method using hybrid technique, which combines two techniques, namely, voice activity detection (VAD) and variable frame rate (VFR) analysis. It efficiently captures the active speech part, the changes in the temporal characteristics of the speech signal, taking into account the signal-to-noise ratio, and thereby speaker-specific information. Experimental results on noisy speech, generated by artificially adding various noise signals to the clean YOHO speech at different SNRs have shown improved results for the frame selection by the hybrid technique in comparison with any one of the techniques used for the hybrid. The proposed hybrid technique outperformed both the VFR and the widely used Gaussian statistical model based VAD method for all noise scenarios at different SNRs, except for the Babble noise corrupted speech at 5 dB SNR, for which, VFR performed better. Considering the average identification accuracies of different noise scenarios, a relative improvement of 9.79% over the VFR, and 18.05% over the Gaussian statistical model based VAD method has been achieved.",

keywords = "Biometric, Frame selection, Robust speaker identification, Variable frame rate (VFR)",

author = "Swati Prasad and Tan, {Zheng Hua} and Ramjee Prasad",

year = "2017",

month = nov,

doi = "10.1007/s11277-017-4544-1",

language = "English",

volume = "97",

pages = "933--950",

journal = "Wireless Personal Communications",

issn = "0929-6212",

publisher = "Springer",

number = "1",

}

TY - JOUR

T1 - Frame Selection for Robust Speaker Identification

T2 - A Hybrid Approach

AU - Prasad, Swati

AU - Tan, Zheng Hua

AU - Prasad, Ramjee

PY - 2017/11

Y1 - 2017/11

N2 - Identification of a person using voice is a challenging task under environmental noises. Important and reliable frame selection for feature extraction from the time-domain speech signal under noise can play a significant role in improving speaker identification accuracy. Therefore, this paper presents a frame selection method using hybrid technique, which combines two techniques, namely, voice activity detection (VAD) and variable frame rate (VFR) analysis. It efficiently captures the active speech part, the changes in the temporal characteristics of the speech signal, taking into account the signal-to-noise ratio, and thereby speaker-specific information. Experimental results on noisy speech, generated by artificially adding various noise signals to the clean YOHO speech at different SNRs have shown improved results for the frame selection by the hybrid technique in comparison with any one of the techniques used for the hybrid. The proposed hybrid technique outperformed both the VFR and the widely used Gaussian statistical model based VAD method for all noise scenarios at different SNRs, except for the Babble noise corrupted speech at 5 dB SNR, for which, VFR performed better. Considering the average identification accuracies of different noise scenarios, a relative improvement of 9.79% over the VFR, and 18.05% over the Gaussian statistical model based VAD method has been achieved.

AB - Identification of a person using voice is a challenging task under environmental noises. Important and reliable frame selection for feature extraction from the time-domain speech signal under noise can play a significant role in improving speaker identification accuracy. Therefore, this paper presents a frame selection method using hybrid technique, which combines two techniques, namely, voice activity detection (VAD) and variable frame rate (VFR) analysis. It efficiently captures the active speech part, the changes in the temporal characteristics of the speech signal, taking into account the signal-to-noise ratio, and thereby speaker-specific information. Experimental results on noisy speech, generated by artificially adding various noise signals to the clean YOHO speech at different SNRs have shown improved results for the frame selection by the hybrid technique in comparison with any one of the techniques used for the hybrid. The proposed hybrid technique outperformed both the VFR and the widely used Gaussian statistical model based VAD method for all noise scenarios at different SNRs, except for the Babble noise corrupted speech at 5 dB SNR, for which, VFR performed better. Considering the average identification accuracies of different noise scenarios, a relative improvement of 9.79% over the VFR, and 18.05% over the Gaussian statistical model based VAD method has been achieved.

KW - Biometric

KW - Frame selection

KW - Robust speaker identification

KW - Variable frame rate (VFR)

UR - http://www.scopus.com/inward/record.url?scp=85019683611&partnerID=8YFLogxK

U2 - 10.1007/s11277-017-4544-1

DO - 10.1007/s11277-017-4544-1

M3 - Journal article

AN - SCOPUS:85019683611

SN - 0929-6212

VL - 97

SP - 933

EP - 950

JO - Wireless Personal Communications

JF - Wireless Personal Communications

IS - 1

ER -

Frame Selection for Robust Speaker Identification: A Hybrid Approach

Abstract

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater