Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding

Achintya Sarkar; Zheng-Hua Tan

doi:10.1109/LSP.2021.3055180

Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding

Achintya Sarkar, Zheng-Hua Tan

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

9 Citationer (Scopus)

27 Downloads (Pure)

Abstract

In this letter, we propose a vocal tract length (VTL)perturbation method for text-dependent speaker verification (TD-SV), in which a set of TD-SV systems are trained, one foreach VTL factor, and score-level fusion is applied to make afinal decision. Next, we explore the bottleneck (BN) featureextracted by training deep neural networks with a self-supervisedlearning objective, autoregressive predictive coding (APC), forTD-SV and comapre it with the well-studied speaker-discriminantBN feature. The proposed VTL method is then applied toAPC and speaker-discriminant BN features. In the end, wecombine the VTL perturbation systems trained on MFCC andthe two BN features in the score domain. Experiments areperformed on the RedDots challenge 2016 database of TD-SVusing short utterances with Gaussian mixture model-universalbackground model and i-vector techniques. Results show theproposed methods significantly outperform the baselines.

Originalsprog	Engelsk
Artikelnummer	9339931
Tidsskrift	I E E E Signal Processing Letters
Vol/bind	28
Sider (fra-til)	364-368
Antal sider	5
ISSN	1070-9908
DOI	https://doi.org/10.1109/LSP.2021.3055180
Status	Udgivet - 28 jan. 2021

Adgang til dokumentet

10.1109/LSP.2021.3055180

Accepted author manuscriptAccepteret manuskript, 289 KB

https://arxiv.org/pdf/2011.12536.pdf

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

@article{074eee7347eb4b74b0c3bd90690a7dc3,

title = "Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding",

abstract = "In this letter, we propose a vocal tract length (VTL)perturbation method for text-dependent speaker verification (TD-SV), in which a set of TD-SV systems are trained, one foreach VTL factor, and score-level fusion is applied to make afinal decision. Next, we explore the bottleneck (BN) featureextracted by training deep neural networks with a self-supervisedlearning objective, autoregressive predictive coding (APC), forTD-SV and comapre it with the well-studied speaker-discriminantBN feature. The proposed VTL method is then applied toAPC and speaker-discriminant BN features. In the end, wecombine the VTL perturbation systems trained on MFCC andthe two BN features in the score domain. Experiments areperformed on the RedDots challenge 2016 database of TD-SVusing short utterances with Gaussian mixture model-universalbackground model and i-vector techniques. Results show theproposed methods significantly outperform the baselines.",

keywords = "Autoregressive prediction coding, Data models, Databases, Feature extraction, GMM-UBM, I-vector, Mel frequency cepstral coefficient, Perturbation methods, Principal component analysis, Text-dependent speaker verification, Training, VTL factor",

author = "Achintya Sarkar and Zheng-Hua Tan",

year = "2021",

month = jan,

day = "28",

doi = "10.1109/LSP.2021.3055180",

language = "English",

volume = "28",

pages = "364--368",

journal = "I E E E Signal Processing Letters",

issn = "1070-9908",

publisher = "IEEE",

}

TY - JOUR

T1 - Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding

AU - Sarkar, Achintya

AU - Tan, Zheng-Hua

PY - 2021/1/28

Y1 - 2021/1/28

N2 - In this letter, we propose a vocal tract length (VTL)perturbation method for text-dependent speaker verification (TD-SV), in which a set of TD-SV systems are trained, one foreach VTL factor, and score-level fusion is applied to make afinal decision. Next, we explore the bottleneck (BN) featureextracted by training deep neural networks with a self-supervisedlearning objective, autoregressive predictive coding (APC), forTD-SV and comapre it with the well-studied speaker-discriminantBN feature. The proposed VTL method is then applied toAPC and speaker-discriminant BN features. In the end, wecombine the VTL perturbation systems trained on MFCC andthe two BN features in the score domain. Experiments areperformed on the RedDots challenge 2016 database of TD-SVusing short utterances with Gaussian mixture model-universalbackground model and i-vector techniques. Results show theproposed methods significantly outperform the baselines.

AB - In this letter, we propose a vocal tract length (VTL)perturbation method for text-dependent speaker verification (TD-SV), in which a set of TD-SV systems are trained, one foreach VTL factor, and score-level fusion is applied to make afinal decision. Next, we explore the bottleneck (BN) featureextracted by training deep neural networks with a self-supervisedlearning objective, autoregressive predictive coding (APC), forTD-SV and comapre it with the well-studied speaker-discriminantBN feature. The proposed VTL method is then applied toAPC and speaker-discriminant BN features. In the end, wecombine the VTL perturbation systems trained on MFCC andthe two BN features in the score domain. Experiments areperformed on the RedDots challenge 2016 database of TD-SVusing short utterances with Gaussian mixture model-universalbackground model and i-vector techniques. Results show theproposed methods significantly outperform the baselines.

KW - Autoregressive prediction coding

KW - Data models

KW - Databases

KW - Feature extraction

KW - GMM-UBM

KW - I-vector

KW - Mel frequency cepstral coefficient

KW - Perturbation methods

KW - Principal component analysis

KW - Text-dependent speaker verification

KW - Training

KW - VTL factor

UR - http://www.scopus.com/inward/record.url?scp=85100501375&partnerID=8YFLogxK

U2 - 10.1109/LSP.2021.3055180

DO - 10.1109/LSP.2021.3055180

M3 - Journal article

SN - 1070-9908

VL - 28

SP - 364

EP - 368

JO - I E E E Signal Processing Letters

JF - I E E E Signal Processing Letters

M1 - 9339931

ER -

Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding

Abstract

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater