Utterance Verification for Text-Dependent Speaker Recognition: A Comparative Assessment Using the RedDots Corpus

Tomi Kinnunen, Md Sahidullah, Ivan Kukanov, Hector Delgado, Massimiliano Todisco, Achintya Kumar Sarkar, Nicolai Bæk Thomsen, Ville Hautamaki, Nicholas Evans, Zheng-Hua Tan

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

32 Citations (Scopus)

Abstract

Text-dependent automatic speaker verification naturally calls for the simultaneous verification of speaker identity and spoken content. These two tasks can be achieved with automatic speaker verification (ASV) and utterance verification (UV) technologies. While both have been addressed previously in the literature, a treatment of simultaneous speaker and utterance
verification with a modern, standard database is so far lacking. This is despite the burgeoning demand for voice biometrics in a plethora of practical security applications. With the goal of improving overall verification performance, this paper reports different strategies for simultaneous ASV and UV in the context
of short-duration, text-dependent speaker verification. Experiments performed on the recently released RedDots corpus are reported for three different ASV systems and four different UV systems. Results show that the combination of utterance verification with automatic speaker verification is (almost) universally
beneficial with significant performance improvements being observed.
Original languageEnglish
Title of host publicationProceedings Interspeech 2016
Number of pages5
PublisherISCA
Publication date8 Sept 2016
DOIs
Publication statusPublished - 8 Sept 2016
EventInterspeech 2016 - San Francisco, CA, United States
Duration: 8 Sept 201612 Sept 2016
http://www.interspeech2016.org/

Conference

ConferenceInterspeech 2016
Country/TerritoryUnited States
CitySan Francisco, CA
Period08/09/201612/09/2016
Internet address
SeriesINTERSPEECH
ISSN1990-9770

Fingerprint

Dive into the research topics of 'Utterance Verification for Text-Dependent Speaker Recognition: A Comparative Assessment Using the RedDots Corpus'. Together they form a unique fingerprint.

Cite this