A Novel Approach to Speaker Weight Estimation Using a Fusion of the i-vector and NFA Frameworks

Publikation: Forskning - peer reviewTidsskriftartikel

Abstrakt

This paper proposes a novel approach for automatic speaker weight estimation from spontaneous telephone speech signals. In this method, each utterance is modeled using the i-vector framework which is based on the factor analysis on Gaussian Mixture Model (GMM) mean supervectors, and the Non-negative Factor Analysis (NFA) framework which is based on a constrained factor analysis on GMM weight supervectors. Then, the available information in both Gaussian means and Gaussian weights is exploited through a feature-level fusion of the i-vectors and the NFA vectors. Finally, a least-squares support vector regression is employed to estimate the weight of speakers from the given utterances.
The proposed approach is evaluated on spontaneous telephone speech signals of National Institute of Standards and Technology 2008 and 2010 Speaker Recognition Evaluation corpora. To investigate the effectiveness of the proposed approach, this method is compared to the i-vector-based speaker weight estimation and an alternative fusion scheme, namely the score-level fusion. Experimental results over 2339 utterances show that the correlation coefficients between the actual and the estimated weights of female and male speakers are 0.49 and 0.56, respectively, which indicate the effectiveness of the proposed method in speaker weight estimation.
Luk

Detaljer

This paper proposes a novel approach for automatic speaker weight estimation from spontaneous telephone speech signals. In this method, each utterance is modeled using the i-vector framework which is based on the factor analysis on Gaussian Mixture Model (GMM) mean supervectors, and the Non-negative Factor Analysis (NFA) framework which is based on a constrained factor analysis on GMM weight supervectors. Then, the available information in both Gaussian means and Gaussian weights is exploited through a feature-level fusion of the i-vectors and the NFA vectors. Finally, a least-squares support vector regression is employed to estimate the weight of speakers from the given utterances.
The proposed approach is evaluated on spontaneous telephone speech signals of National Institute of Standards and Technology 2008 and 2010 Speaker Recognition Evaluation corpora. To investigate the effectiveness of the proposed approach, this method is compared to the i-vector-based speaker weight estimation and an alternative fusion scheme, namely the score-level fusion. Experimental results over 2339 utterances show that the correlation coefficients between the actual and the estimated weights of female and male speakers are 0.49 and 0.56, respectively, which indicate the effectiveness of the proposed method in speaker weight estimation.
OriginalsprogEngelsk
TidsskriftJournal of Electrical Systems and Signals
Vol/bind3
Tidsskriftsnummer1
Sider (fra-til)47-55
Antal sider8
ISSN2322-5483
StatusUdgivet - feb. 2017
PublikationsartForskning
Peer reviewJa

Download-statistik

Ingen data tilgængelig
ID: 248986250