Abstract
In this letter, we propose a vocal tract length (VTL)perturbation method for text-dependent speaker verification (TD-SV), in which a set of TD-SV systems are trained, one foreach VTL factor, and score-level fusion is applied to make afinal decision. Next, we explore the bottleneck (BN) featureextracted by training deep neural networks with a self-supervisedlearning objective, autoregressive predictive coding (APC), forTD-SV and comapre it with the well-studied speaker-discriminantBN feature. The proposed VTL method is then applied toAPC and speaker-discriminant BN features. In the end, wecombine the VTL perturbation systems trained on MFCC andthe two BN features in the score domain. Experiments areperformed on the RedDots challenge 2016 database of TD-SVusing short utterances with Gaussian mixture model-universalbackground model and i-vector techniques. Results show theproposed methods significantly outperform the baselines.
Original language | English |
---|---|
Article number | 9339931 |
Journal | I E E E Signal Processing Letters |
Volume | 28 |
Pages (from-to) | 364-368 |
Number of pages | 5 |
ISSN | 1070-9908 |
DOIs | |
Publication status | Published - 28 Jan 2021 |
Keywords
- Autoregressive prediction coding
- Data models
- Databases
- Feature extraction
- GMM-UBM
- I-vector
- Mel frequency cepstral coefficient
- Perturbation methods
- Principal component analysis
- Text-dependent speaker verification
- Training
- VTL factor