Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions

Santiago Prieto-Calero, Alfonso Ortega, Iván López Espejo, Eduardo Lleida

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

2 Citations (Scopus)
45 Downloads (Pure)

Abstract

The performance of speaker verification systems degrades when vocal effort conditions between enrollment and test (e.g., shouted vs. normal speech) are different. This is a potential situation in non-cooperative speaker verification tasks. In this paper, we present a study on different methods for linear compensation of embeddings making use of Gaussian mixture models to cluster shouted and normal speech domains. These compensation techniques are borrowed from the area of robustness for automatic speech recognition and, in this work, we apply them to compensate the mismatch between shouted and normal conditions in speaker verification. Before compensation, shouted condition is automatically detected by means of logistic regression. The process is computationally light and it is performed in the back-end of an x-vector system. Experimental results show that applying the proposed approach in the presence of vocal effort mismatch yields up to 13.8% equal error rate relative improvement with respect to a system that applies neither shouted speech detection nor compensation.

Original languageEnglish
Title of host publicationInterspeech 2020
Number of pages5
Publication date2020
Pages1511-1515
DOIs
Publication statusPublished - 2020
EventInterspeech 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Conference

ConferenceInterspeech 2020
Country/TerritoryChina
CityShanghai
Period25/10/202029/10/2020
SeriesProceedings of the International Conference on Spoken Language Processing
ISSN1990-9772

Keywords

  • Domain compensation
  • Shouted speech
  • Speaker verification
  • Vocal effort mismatch

Fingerprint

Dive into the research topics of 'Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions'. Together they form a unique fingerprint.

Cite this