Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions

Santiago Prieto-Calero; Alfonso Ortega; Iván López Espejo; Eduardo Lleida

doi:10.21437/Interspeech.2020-1402

Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions

Santiago Prieto-Calero, Alfonso Ortega, Iván López Espejo, Eduardo Lleida

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

2 Citations (Scopus)

45 Downloads (Pure)

Abstract

The performance of speaker verification systems degrades when vocal effort conditions between enrollment and test (e.g., shouted vs. normal speech) are different. This is a potential situation in non-cooperative speaker verification tasks. In this paper, we present a study on different methods for linear compensation of embeddings making use of Gaussian mixture models to cluster shouted and normal speech domains. These compensation techniques are borrowed from the area of robustness for automatic speech recognition and, in this work, we apply them to compensate the mismatch between shouted and normal conditions in speaker verification. Before compensation, shouted condition is automatically detected by means of logistic regression. The process is computationally light and it is performed in the back-end of an x-vector system. Experimental results show that applying the proposed approach in the presence of vocal effort mismatch yields up to 13.8% equal error rate relative improvement with respect to a system that applies neither shouted speech detection nor compensation.

Original language	English
Title of host publication	Interspeech 2020
Number of pages	5
Publication date	2020
Pages	1511-1515
DOIs	https://doi.org/10.21437/Interspeech.2020-1402
Publication status	Published - 2020
Event	Interspeech 2020 - Shanghai, China Duration: 25 Oct 2020 → 29 Oct 2020

Conference

Conference	Interspeech 2020
Country/Territory	China
City	Shanghai
Period	25/10/2020 → 29/10/2020

Series	Proceedings of the International Conference on Spoken Language Processing
ISSN	1990-9772

Keywords

Domain compensation
Shouted speech
Speaker verification
Vocal effort mismatch

Access to Document

10.21437/Interspeech.2020-1402

Open Access ArticleFinal published version, 447 KB

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@inproceedings{1e9e1d5f26134382be818008f0567031,

title = "Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions",

abstract = "The performance of speaker verification systems degrades when vocal effort conditions between enrollment and test (e.g., shouted vs. normal speech) are different. This is a potential situation in non-cooperative speaker verification tasks. In this paper, we present a study on different methods for linear compensation of embeddings making use of Gaussian mixture models to cluster shouted and normal speech domains. These compensation techniques are borrowed from the area of robustness for automatic speech recognition and, in this work, we apply them to compensate the mismatch between shouted and normal conditions in speaker verification. Before compensation, shouted condition is automatically detected by means of logistic regression. The process is computationally light and it is performed in the back-end of an x-vector system. Experimental results show that applying the proposed approach in the presence of vocal effort mismatch yields up to 13.8% equal error rate relative improvement with respect to a system that applies neither shouted speech detection nor compensation.",

keywords = "Domain compensation, Shouted speech, Speaker verification, Vocal effort mismatch",

author = "Santiago Prieto-Calero and Alfonso Ortega and Espejo, {Iv{\'a}n L{\'o}pez} and Eduardo Lleida",

year = "2020",

doi = "10.21437/Interspeech.2020-1402",

language = "English",

series = "Proceedings of the International Conference on Spoken Language Processing",

publisher = "International Speech Communication Association",

pages = "1511--1515",

booktitle = "Interspeech 2020",

note = "Interspeech 2020 ; Conference date: 25-10-2020 Through 29-10-2020",

}

Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions. / Prieto-Calero, Santiago; Ortega, Alfonso; Espejo, Iván López et al.
Interspeech 2020. 2020. p. 1511-1515 (Proceedings of the International Conference on Spoken Language Processing).

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

TY - GEN

T1 - Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions

AU - Prieto-Calero, Santiago

AU - Ortega, Alfonso

AU - Espejo, Iván López

AU - Lleida, Eduardo

PY - 2020

Y1 - 2020

N2 - The performance of speaker verification systems degrades when vocal effort conditions between enrollment and test (e.g., shouted vs. normal speech) are different. This is a potential situation in non-cooperative speaker verification tasks. In this paper, we present a study on different methods for linear compensation of embeddings making use of Gaussian mixture models to cluster shouted and normal speech domains. These compensation techniques are borrowed from the area of robustness for automatic speech recognition and, in this work, we apply them to compensate the mismatch between shouted and normal conditions in speaker verification. Before compensation, shouted condition is automatically detected by means of logistic regression. The process is computationally light and it is performed in the back-end of an x-vector system. Experimental results show that applying the proposed approach in the presence of vocal effort mismatch yields up to 13.8% equal error rate relative improvement with respect to a system that applies neither shouted speech detection nor compensation.

AB - The performance of speaker verification systems degrades when vocal effort conditions between enrollment and test (e.g., shouted vs. normal speech) are different. This is a potential situation in non-cooperative speaker verification tasks. In this paper, we present a study on different methods for linear compensation of embeddings making use of Gaussian mixture models to cluster shouted and normal speech domains. These compensation techniques are borrowed from the area of robustness for automatic speech recognition and, in this work, we apply them to compensate the mismatch between shouted and normal conditions in speaker verification. Before compensation, shouted condition is automatically detected by means of logistic regression. The process is computationally light and it is performed in the back-end of an x-vector system. Experimental results show that applying the proposed approach in the presence of vocal effort mismatch yields up to 13.8% equal error rate relative improvement with respect to a system that applies neither shouted speech detection nor compensation.

KW - Domain compensation

KW - Shouted speech

KW - Speaker verification

KW - Vocal effort mismatch

UR - http://www.scopus.com/inward/record.url?scp=85098122864&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2020-1402

DO - 10.21437/Interspeech.2020-1402

M3 - Article in proceeding

T3 - Proceedings of the International Conference on Spoken Language Processing

SP - 1511

EP - 1515

BT - Interspeech 2020

T2 - Interspeech 2020

Y2 - 25 October 2020 through 29 October 2020

ER -

Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions

Abstract

Conference

Keywords

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this