Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification

Hector Delgado; Massimiliano Todisco; Md Sahidullah; Achintya Kumar Sarkar; Nicholas Evans; Tomi Kinnunen; Zheng-Hua Tan

doi:10.1109/SLT.2016.7846262

Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification

Hector Delgado, Massimiliano Todisco, Md Sahidullah, Achintya Kumar Sarkar, Nicholas Evans, Tomi Kinnunen, Zheng-Hua Tan

Institut for Elektroniske Systemer

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

32 Citationer (Scopus)

Abstract

Many authentication applications involving automatic speaker verification (ASV) demand robust performance using short-duration, fixed or prompted text utterances. Text constraints not only reduce the phone-mismatch between enrollment and test utterances, which generally leads to improved performance, but also provide an ancillary level of security. This can take the form of explicit utterance verification (UV). An integrated UV + ASV system should then verify access attempts which contain not just the expected speaker, but also the expected text content. This paper presents such a system and introduces new features which are used for both UV and ASV tasks. Based upon multi-resolution, spectro-temporal analysis and when
fused with more traditional parameterisations, the new features not only generally outperform Mel-frequency cepstral coefficients, but also are shown to be complementary when fusing systems at score level. Finally, the joint operation of UV and ASV greatly decreases false acceptances for unmatched text trials.

Originalsprog	Engelsk
Titel	Spoken Language Technology Workshop (SLT), 2016 IEEE
Antal sider	7
Forlag	IEEE
Publikationsdato	13 dec. 2016
Sider	179-185
ISBN (Elektronisk)	978-1-5090-4903-5
DOI	https://doi.org/10.1109/SLT.2016.7846262
Status	Udgivet - 13 dec. 2016
Begivenhed	2016 IEEE Workshop on Spoken Language Technology - San Diego, California, USA Varighed: 13 dec. 2016 → 16 dec. 2016 http://www.slt2016.org/default.asp

Konference

Konference	2016 IEEE Workshop on Spoken Language Technology
Land/Område	USA
By	San Diego, California
Periode	13/12/2016 → 16/12/2016
Internetadresse	http://www.slt2016.org/default.asp

Adgang til dokumentet

10.1109/SLT.2016.7846262

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Citationsformater

@inproceedings{3314ac907a0047e8b7e4dbf7077647c2,

title = "Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification",

abstract = "Many authentication applications involving automatic speaker verification (ASV) demand robust performance using short-duration, fixed or prompted text utterances. Text constraints not only reduce the phone-mismatch between enrollment and test utterances, which generally leads to improved performance, but also provide an ancillary level of security. This can take the form of explicit utterance verification (UV). An integrated UV + ASV system should then verify access attempts which contain not just the expected speaker, but also the expected text content. This paper presents such a system and introduces new features which are used for both UV and ASV tasks. Based upon multi-resolution, spectro-temporal analysis and whenfused with more traditional parameterisations, the new features not only generally outperform Mel-frequency cepstral coefficients, but also are shown to be complementary when fusing systems at score level. Finally, the joint operation of UV and ASV greatly decreases false acceptances for unmatched text trials.",

keywords = " speaker verification, utterance verification, text dependent, constant Q transform",

author = "Hector Delgado and Massimiliano Todisco and Md Sahidullah and Sarkar, {Achintya Kumar} and Nicholas Evans and Tomi Kinnunen and Zheng-Hua Tan",

year = "2016",

month = dec,

day = "13",

doi = "10.1109/SLT.2016.7846262",

language = "English",

pages = "179--185",

booktitle = "Spoken Language Technology Workshop (SLT), 2016 IEEE",

publisher = "IEEE",

address = "United States",

note = "2016 IEEE Workshop on Spoken Language Technology, SLT ; Conference date: 13-12-2016 Through 16-12-2016",

url = "http://www.slt2016.org/default.asp",

}

Delgado, H, Todisco, M, Sahidullah, M, Sarkar, AK, Evans, N, Kinnunen, T & Tan, Z-H 2016, Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification. i Spoken Language Technology Workshop (SLT), 2016 IEEE. IEEE, s. 179-185, 2016 IEEE Workshop on Spoken Language Technology, San Diego, California, USA, 13/12/2016. https://doi.org/10.1109/SLT.2016.7846262

Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification. / Delgado, Hector; Todisco, Massimiliano; Sahidullah, Md et al.
Spoken Language Technology Workshop (SLT), 2016 IEEE. IEEE, 2016. s. 179-185.

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

TY - GEN

T1 - Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification

AU - Delgado, Hector

AU - Todisco, Massimiliano

AU - Sahidullah, Md

AU - Sarkar, Achintya Kumar

AU - Evans, Nicholas

AU - Kinnunen, Tomi

AU - Tan, Zheng-Hua

PY - 2016/12/13

Y1 - 2016/12/13

N2 - Many authentication applications involving automatic speaker verification (ASV) demand robust performance using short-duration, fixed or prompted text utterances. Text constraints not only reduce the phone-mismatch between enrollment and test utterances, which generally leads to improved performance, but also provide an ancillary level of security. This can take the form of explicit utterance verification (UV). An integrated UV + ASV system should then verify access attempts which contain not just the expected speaker, but also the expected text content. This paper presents such a system and introduces new features which are used for both UV and ASV tasks. Based upon multi-resolution, spectro-temporal analysis and whenfused with more traditional parameterisations, the new features not only generally outperform Mel-frequency cepstral coefficients, but also are shown to be complementary when fusing systems at score level. Finally, the joint operation of UV and ASV greatly decreases false acceptances for unmatched text trials.

AB - Many authentication applications involving automatic speaker verification (ASV) demand robust performance using short-duration, fixed or prompted text utterances. Text constraints not only reduce the phone-mismatch between enrollment and test utterances, which generally leads to improved performance, but also provide an ancillary level of security. This can take the form of explicit utterance verification (UV). An integrated UV + ASV system should then verify access attempts which contain not just the expected speaker, but also the expected text content. This paper presents such a system and introduces new features which are used for both UV and ASV tasks. Based upon multi-resolution, spectro-temporal analysis and whenfused with more traditional parameterisations, the new features not only generally outperform Mel-frequency cepstral coefficients, but also are shown to be complementary when fusing systems at score level. Finally, the joint operation of UV and ASV greatly decreases false acceptances for unmatched text trials.

KW - speaker verification

KW - utterance verification

KW - text dependent

KW - constant Q transform

UR - http://www.slt2016.org/default.asp

UR - http://www.slt2016.org/Papers/ViewPapers.asp?PaperNum=1080

U2 - 10.1109/SLT.2016.7846262

DO - 10.1109/SLT.2016.7846262

M3 - Article in proceeding

SP - 179

EP - 185

BT - Spoken Language Technology Workshop (SLT), 2016 IEEE

PB - IEEE

T2 - 2016 IEEE Workshop on Spoken Language Technology

Y2 - 13 December 2016 through 16 December 2016

ER -

Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification

Abstract

Konference

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater