Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding

Achintya Sarkar, Zheng-Hua Tan

Research output: Contribution to journalJournal articleResearchpeer-review

8 Citations (Scopus)
22 Downloads (Pure)

Abstract

In this letter, we propose a vocal tract length (VTL)perturbation method for text-dependent speaker verification (TD-SV), in which a set of TD-SV systems are trained, one foreach VTL factor, and score-level fusion is applied to make afinal decision. Next, we explore the bottleneck (BN) featureextracted by training deep neural networks with a self-supervisedlearning objective, autoregressive predictive coding (APC), forTD-SV and comapre it with the well-studied speaker-discriminantBN feature. The proposed VTL method is then applied toAPC and speaker-discriminant BN features. In the end, wecombine the VTL perturbation systems trained on MFCC andthe two BN features in the score domain. Experiments areperformed on the RedDots challenge 2016 database of TD-SVusing short utterances with Gaussian mixture model-universalbackground model and i-vector techniques. Results show theproposed methods significantly outperform the baselines.

Original languageEnglish
Article number9339931
JournalI E E E Signal Processing Letters
Volume28
Pages (from-to)364-368
Number of pages5
ISSN1070-9908
DOIs
Publication statusPublished - 28 Jan 2021

Keywords

  • Autoregressive prediction coding
  • Data models
  • Databases
  • Feature extraction
  • GMM-UBM
  • I-vector
  • Mel frequency cepstral coefficient
  • Perturbation methods
  • Principal component analysis
  • Text-dependent speaker verification
  • Training
  • VTL factor

Fingerprint

Dive into the research topics of 'Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding'. Together they form a unique fingerprint.

Cite this