Total Variability Modeling using Source-specific Priors

Sven Ewan Shepstone, Kong Aik Lee, Haizhou Li, Zheng-Hua Tan, Søren Holdt Jensen

Research output: Contribution to journalJournal articleResearchpeer-review

7 Citations (Scopus)

Abstract

In total variability modeling, variable length speech utterances are mapped to fixed low-dimensional i-vectors. Central to computing the total variability matrix and i-vector extraction, is the computation of the posterior distribution for a latent variable conditioned on an observed feature sequence of an utterance. In both cases the prior for the latent variable is assumed to be non-informative, since for homogeneous datasets there is no gain in generality in using an informative prior. This work shows in the heterogeneous case, that using informative priors for com- puting the posterior, can lead to favorable results. We focus on modeling the priors using minimum divergence criterion or fac- tor analysis techniques. Tests on the NIST 2008 and 2010 Speaker Recognition Evaluation (SRE) dataset show that our proposed method beats four baselines: For i-vector extraction using an already trained matrix, for the short2-short3 task in SRE’08, five out of eight female and four out of eight male common conditions, were improved. For the core-extended task in SRE’10, four out of nine female and six out of nine male common conditions were improved. When incorporating prior information into the training of the T matrix itself, the proposed method beats the baselines for six out of eight female and five out of eight male common conditions, for SRE’08, and five and six out of nine conditions, for the male and female case, respectively, for SRE’10. Tests using factor analysis for estimating priors show that two priors do not offer much improvement, but in the case of three separate priors (sparse data), considerable improvements were gained.
Original languageEnglish
JournalI E E E Transactions on Audio, Speech and Language Processing
Volume24
Issue number3
Pages (from-to)504-517
Number of pages14
ISSN1558-7916
DOIs
Publication statusPublished - Mar 2016

Fingerprint

Factor analysis
factor analysis
synchronism
matrices
divergence
education
estimating
evaluation

Cite this

@article{12ec8a8ea2d64b55b9021f90bf991833,
title = "Total Variability Modeling using Source-specific Priors",
abstract = "In total variability modeling, variable length speech utterances are mapped to fixed low-dimensional i-vectors. Central to computing the total variability matrix and i-vector extraction, is the computation of the posterior distribution for a latent variable conditioned on an observed feature sequence of an utterance. In both cases the prior for the latent variable is assumed to be non-informative, since for homogeneous datasets there is no gain in generality in using an informative prior. This work shows in the heterogeneous case, that using informative priors for com- puting the posterior, can lead to favorable results. We focus on modeling the priors using minimum divergence criterion or fac- tor analysis techniques. Tests on the NIST 2008 and 2010 Speaker Recognition Evaluation (SRE) dataset show that our proposed method beats four baselines: For i-vector extraction using an already trained matrix, for the short2-short3 task in SRE’08, five out of eight female and four out of eight male common conditions, were improved. For the core-extended task in SRE’10, four out of nine female and six out of nine male common conditions were improved. When incorporating prior information into the training of the T matrix itself, the proposed method beats the baselines for six out of eight female and five out of eight male common conditions, for SRE’08, and five and six out of nine conditions, for the male and female case, respectively, for SRE’10. Tests using factor analysis for estimating priors show that two priors do not offer much improvement, but in the case of three separate priors (sparse data), considerable improvements were gained.",
author = "Shepstone, {Sven Ewan} and Lee, {Kong Aik} and Haizhou Li and Zheng-Hua Tan and Jensen, {S{\o}ren Holdt}",
year = "2016",
month = "3",
doi = "10.1109/TASLP.2016.2515506",
language = "English",
volume = "24",
pages = "504--517",
journal = "IEEE/ACM Transactions on Audio, Speech, and Language Processing",
issn = "2329-9290",
publisher = "IEEE Signal Processing Society",
number = "3",

}

Total Variability Modeling using Source-specific Priors. / Shepstone, Sven Ewan; Lee, Kong Aik; Li, Haizhou; Tan, Zheng-Hua; Jensen, Søren Holdt.

In: I E E E Transactions on Audio, Speech and Language Processing, Vol. 24, No. 3, 03.2016, p. 504-517.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Total Variability Modeling using Source-specific Priors

AU - Shepstone, Sven Ewan

AU - Lee, Kong Aik

AU - Li, Haizhou

AU - Tan, Zheng-Hua

AU - Jensen, Søren Holdt

PY - 2016/3

Y1 - 2016/3

N2 - In total variability modeling, variable length speech utterances are mapped to fixed low-dimensional i-vectors. Central to computing the total variability matrix and i-vector extraction, is the computation of the posterior distribution for a latent variable conditioned on an observed feature sequence of an utterance. In both cases the prior for the latent variable is assumed to be non-informative, since for homogeneous datasets there is no gain in generality in using an informative prior. This work shows in the heterogeneous case, that using informative priors for com- puting the posterior, can lead to favorable results. We focus on modeling the priors using minimum divergence criterion or fac- tor analysis techniques. Tests on the NIST 2008 and 2010 Speaker Recognition Evaluation (SRE) dataset show that our proposed method beats four baselines: For i-vector extraction using an already trained matrix, for the short2-short3 task in SRE’08, five out of eight female and four out of eight male common conditions, were improved. For the core-extended task in SRE’10, four out of nine female and six out of nine male common conditions were improved. When incorporating prior information into the training of the T matrix itself, the proposed method beats the baselines for six out of eight female and five out of eight male common conditions, for SRE’08, and five and six out of nine conditions, for the male and female case, respectively, for SRE’10. Tests using factor analysis for estimating priors show that two priors do not offer much improvement, but in the case of three separate priors (sparse data), considerable improvements were gained.

AB - In total variability modeling, variable length speech utterances are mapped to fixed low-dimensional i-vectors. Central to computing the total variability matrix and i-vector extraction, is the computation of the posterior distribution for a latent variable conditioned on an observed feature sequence of an utterance. In both cases the prior for the latent variable is assumed to be non-informative, since for homogeneous datasets there is no gain in generality in using an informative prior. This work shows in the heterogeneous case, that using informative priors for com- puting the posterior, can lead to favorable results. We focus on modeling the priors using minimum divergence criterion or fac- tor analysis techniques. Tests on the NIST 2008 and 2010 Speaker Recognition Evaluation (SRE) dataset show that our proposed method beats four baselines: For i-vector extraction using an already trained matrix, for the short2-short3 task in SRE’08, five out of eight female and four out of eight male common conditions, were improved. For the core-extended task in SRE’10, four out of nine female and six out of nine male common conditions were improved. When incorporating prior information into the training of the T matrix itself, the proposed method beats the baselines for six out of eight female and five out of eight male common conditions, for SRE’08, and five and six out of nine conditions, for the male and female case, respectively, for SRE’10. Tests using factor analysis for estimating priors show that two priors do not offer much improvement, but in the case of three separate priors (sparse data), considerable improvements were gained.

U2 - 10.1109/TASLP.2016.2515506

DO - 10.1109/TASLP.2016.2515506

M3 - Journal article

VL - 24

SP - 504

EP - 517

JO - IEEE/ACM Transactions on Audio, Speech, and Language Processing

JF - IEEE/ACM Transactions on Audio, Speech, and Language Processing

SN - 2329-9290

IS - 3

ER -