Source-specific Informative Prior for i-Vector Extraction

Sven Ewan Shepstone; Kong Aik Lee; Haizhou Li; Zheng-Hua Tan; Søren Holdt Jensen

doi:10.1109/ICASSP.2015.7178759

Source-specific Informative Prior for i-Vector Extraction

Sven Ewan Shepstone, Kong Aik Lee, Haizhou Li, Zheng-Hua Tan, Søren Holdt Jensen

Institut for Elektroniske Systemer

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

5 Citationer (Scopus)

Abstract

An i-vector is a low-dimensional fixed-length representation of a variable-length speech utterance, and is defined as the posterior mean of a latent variable conditioned on the observed feature sequence of an utterance. The assumption is that the prior for the latent variable is non-informative, since for homogeneous datasets there is no gain in generality in using an informative prior. This work shows that extracting i-vectors for a heterogeneous dataset, containing speech samples recorded from multiple sources, using informative priors instead is applicable, and leads to favorable results. Tests carried out on the NIST 2008 and 2010 Speaker Recognition Evaluation (SRE) dataset show that our proposed method beats three baselines: For the short2-short3 core-task in SRE'08, for the female and male cases, five and six respectively, out of eight common conditions were beaten, and for the core-core task in SRE'10, for both genders, five out of nine common conditions were beaten.

Originalsprog	Engelsk
Titel	IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015
Forlag	IEEE Signal Processing Society
Publikationsdato	apr. 2015
Sider	4185 - 4189
ISBN (Elektronisk)	978-1-4673-6997-8
DOI	https://doi.org/10.1109/ICASSP.2015.7178759
Status	Udgivet - apr. 2015
Begivenhed	40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015 - Brisbane, Australien Varighed: 19 apr. 2015 → 24 apr. 2015 Konferencens nummer: 2015

Konference

Konference	40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
Nummer	2015
Land/Område	Australien
By	Brisbane
Periode	19/04/2015 → 24/04/2015

Navn	I E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings
ISSN	1520-6149

Emneord

i-vector, informative prior, total variability, source variation

Adgang til dokumentet

10.1109/ICASSP.2015.7178759

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Citationsformater

@inproceedings{fb956522d4b7453aa9aaddfdb22d26db,

title = "Source-specific Informative Prior for i-Vector Extraction",

abstract = "An i-vector is a low-dimensional fixed-length representation of a variable-length speech utterance, and is defined as the posterior mean of a latent variable conditioned on the observed feature sequence of an utterance. The assumption is that the prior for the latent variable is non-informative, since for homogeneous datasets there is no gain in generality in using an informative prior. This work shows that extracting i-vectors for a heterogeneous dataset, containing speech samples recorded from multiple sources, using informative priors instead is applicable, and leads to favorable results. Tests carried out on the NIST 2008 and 2010 Speaker Recognition Evaluation (SRE) dataset show that our proposed method beats three baselines: For the short2-short3 core-task in SRE'08, for the female and male cases, five and six respectively, out of eight common conditions were beaten, and for the core-core task in SRE'10, for both genders, five out of nine common conditions were beaten.",

keywords = "i-vector, informative prior, total variability, source variation",

author = "Shepstone, {Sven Ewan} and Lee, {Kong Aik} and Haizhou Li and Zheng-Hua Tan and Jensen, {S{\o}ren Holdt}",

year = "2015",

month = apr,

doi = "10.1109/ICASSP.2015.7178759",

language = "English",

series = "I E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings",

publisher = "IEEE Signal Processing Society",

pages = "4185 -- 4189",

booktitle = "IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015",

address = "United States",

note = "40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015, ICASSP ; Conference date: 19-04-2015 Through 24-04-2015",

}

Shepstone, SE, Lee, KA, Li, H, Tan, Z-H & Jensen, SH 2015, Source-specific Informative Prior for i-Vector Extraction. i IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015. IEEE Signal Processing Society, I E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings, s. 4185 - 4189, 40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015, Brisbane, Australien, 19/04/2015. https://doi.org/10.1109/ICASSP.2015.7178759

Source-specific Informative Prior for i-Vector Extraction. / Shepstone, Sven Ewan; Lee, Kong Aik; Li, Haizhou et al.
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015. IEEE Signal Processing Society, 2015. s. 4185 - 4189 (I E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings).

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

TY - GEN

T1 - Source-specific Informative Prior for i-Vector Extraction

AU - Shepstone, Sven Ewan

AU - Lee, Kong Aik

AU - Li, Haizhou

AU - Tan, Zheng-Hua

AU - Jensen, Søren Holdt

N1 - Conference code: 2015

PY - 2015/4

Y1 - 2015/4

N2 - An i-vector is a low-dimensional fixed-length representation of a variable-length speech utterance, and is defined as the posterior mean of a latent variable conditioned on the observed feature sequence of an utterance. The assumption is that the prior for the latent variable is non-informative, since for homogeneous datasets there is no gain in generality in using an informative prior. This work shows that extracting i-vectors for a heterogeneous dataset, containing speech samples recorded from multiple sources, using informative priors instead is applicable, and leads to favorable results. Tests carried out on the NIST 2008 and 2010 Speaker Recognition Evaluation (SRE) dataset show that our proposed method beats three baselines: For the short2-short3 core-task in SRE'08, for the female and male cases, five and six respectively, out of eight common conditions were beaten, and for the core-core task in SRE'10, for both genders, five out of nine common conditions were beaten.

AB - An i-vector is a low-dimensional fixed-length representation of a variable-length speech utterance, and is defined as the posterior mean of a latent variable conditioned on the observed feature sequence of an utterance. The assumption is that the prior for the latent variable is non-informative, since for homogeneous datasets there is no gain in generality in using an informative prior. This work shows that extracting i-vectors for a heterogeneous dataset, containing speech samples recorded from multiple sources, using informative priors instead is applicable, and leads to favorable results. Tests carried out on the NIST 2008 and 2010 Speaker Recognition Evaluation (SRE) dataset show that our proposed method beats three baselines: For the short2-short3 core-task in SRE'08, for the female and male cases, five and six respectively, out of eight common conditions were beaten, and for the core-core task in SRE'10, for both genders, five out of nine common conditions were beaten.

KW - i-vector, informative prior, total variability, source variation

U2 - 10.1109/ICASSP.2015.7178759

DO - 10.1109/ICASSP.2015.7178759

M3 - Article in proceeding

T3 - I E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings

SP - 4185

EP - 4189

BT - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015

PB - IEEE Signal Processing Society

T2 - 40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015

Y2 - 19 April 2015 through 24 April 2015

ER -

Shepstone SE, Lee KA, Li H, Tan Z-H, Jensen SH. Source-specific Informative Prior for i-Vector Extraction. I IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015. IEEE Signal Processing Society. 2015. s. 4185 - 4189. (I E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings). doi: 10.1109/ICASSP.2015.7178759

Source-specific Informative Prior for i-Vector Extraction

Abstract

Konference

Emneord

Adgang til dokumentet

AUB Link

Fingeraftryk

Citationsformater