Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification

Ivan Lopez Espejo, Santiago Prieto-Calero, Alfonso Ortega, Eduardo Lleida

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

Abstract

Despite the maturity of modern speaker verification technology, its performance still significantly degrades when facing non-neutrally-phonated (e.g., shouted and whispered) speech. To address this issue, in this paper, we propose a new speaker embedding compensation method based on a minimum mean square error (MMSE) estimator. This method models the joint distribution of the vocal effort transfer vector and nonneutrally-phonated embedding spaces and operates in a principal component analysis domain to cope with non-neutrallyphonated speech data scarcity. Experiments are carried out using a cutting-edge speaker verification system integrating a powerful self-supervised pre-trained model for speech representation. In comparison with a state-of-the-art embedding compensation method, the proposed MMSE estimator yields superior and competitive equal error rate results when tackling shouted and whispered speech, respectively.

OriginalsprogEngelsk
TitelProceedings of the 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing, MLSP 2023
RedaktørerDanilo Comminiello, Michele Scarpiniti
Antal sider6
ForlagIEEE
Publikationsdato23 okt. 2023
Artikelnummer10285923
ISBN (Trykt)979-8-3503-2412-9
ISBN (Elektronisk)979-8-3503-2411-2
DOI
StatusUdgivet - 23 okt. 2023
Begivenhed2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP) - Rom, Italien
Varighed: 17 sep. 202320 sep. 2023

Konference

Konference2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP)
Land/OmrådeItalien
ByRom
Periode17/09/202320/09/2023
NavnIEEE Workshop on Machine Learning for Signal Processing
ISSN1551-2541

Fingeraftryk

Dyk ned i forskningsemnerne om 'Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification'. Sammen danner de et unikt fingeraftryk.

Citationsformater