TY - JOUR
T1 - Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding
AU - Sarkar, Achintya
AU - Tan, Zheng-Hua
PY - 2021/1/28
Y1 - 2021/1/28
N2 - In this letter, we propose a vocal tract length (VTL)perturbation method for text-dependent speaker verification (TD-SV), in which a set of TD-SV systems are trained, one foreach VTL factor, and score-level fusion is applied to make afinal decision. Next, we explore the bottleneck (BN) featureextracted by training deep neural networks with a self-supervisedlearning objective, autoregressive predictive coding (APC), forTD-SV and comapre it with the well-studied speaker-discriminantBN feature. The proposed VTL method is then applied toAPC and speaker-discriminant BN features. In the end, wecombine the VTL perturbation systems trained on MFCC andthe two BN features in the score domain. Experiments areperformed on the RedDots challenge 2016 database of TD-SVusing short utterances with Gaussian mixture model-universalbackground model and i-vector techniques. Results show theproposed methods significantly outperform the baselines.
AB - In this letter, we propose a vocal tract length (VTL)perturbation method for text-dependent speaker verification (TD-SV), in which a set of TD-SV systems are trained, one foreach VTL factor, and score-level fusion is applied to make afinal decision. Next, we explore the bottleneck (BN) featureextracted by training deep neural networks with a self-supervisedlearning objective, autoregressive predictive coding (APC), forTD-SV and comapre it with the well-studied speaker-discriminantBN feature. The proposed VTL method is then applied toAPC and speaker-discriminant BN features. In the end, wecombine the VTL perturbation systems trained on MFCC andthe two BN features in the score domain. Experiments areperformed on the RedDots challenge 2016 database of TD-SVusing short utterances with Gaussian mixture model-universalbackground model and i-vector techniques. Results show theproposed methods significantly outperform the baselines.
KW - Autoregressive prediction coding
KW - Data models
KW - Databases
KW - Feature extraction
KW - GMM-UBM
KW - I-vector
KW - Mel frequency cepstral coefficient
KW - Perturbation methods
KW - Principal component analysis
KW - Text-dependent speaker verification
KW - Training
KW - VTL factor
UR - http://www.scopus.com/inward/record.url?scp=85100501375&partnerID=8YFLogxK
U2 - 10.1109/LSP.2021.3055180
DO - 10.1109/LSP.2021.3055180
M3 - Journal article
SN - 1070-9908
VL - 28
SP - 364
EP - 368
JO - I E E E Signal Processing Letters
JF - I E E E Signal Processing Letters
M1 - 9339931
ER -