Text-Independent Speaker Identification Using the Histogram Transform Model

Zhanyu Ma, Hong Yu, Zheng-Hua Tan, Jun Guo

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

15 Citationer (Scopus)
145 Downloads (Pure)

Resumé

In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together. These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes the commonly occurred discontinuity problem in multivariate histograms computing, more training data are generated by the HT method. Using these generated data, a smooth PDF of the super-MFCCs vectors is obtained. Comparing with the typical PDF estimation methods, such as Gaussian mixture model, promising improvements have been obatined by employing the HT-based model in SI.
OriginalsprogEngelsk
Artikelnummer7803586
TidsskriftIEEE Access
Vol/bind4
Sider (fra-til)9733-9739
Antal sider6
ISSN2169-3536
DOI
StatusUdgivet - 2016

Fingerprint

Probability density function
Identification (control systems)
Mathematical transformations
Statistical Models

Citer dette

Ma, Zhanyu ; Yu, Hong ; Tan, Zheng-Hua ; Guo, Jun. / Text-Independent Speaker Identification Using the Histogram Transform Model. I: IEEE Access. 2016 ; Bind 4. s. 9733-9739.
@article{a7c67af1e0cf48a499870f1fc245bded,
title = "Text-Independent Speaker Identification Using the Histogram Transform Model",
abstract = "In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together. These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes the commonly occurred discontinuity problem in multivariate histograms computing, more training data are generated by the HT method. Using these generated data, a smooth PDF of the super-MFCCs vectors is obtained. Comparing with the typical PDF estimation methods, such as Gaussian mixture model, promising improvements have been obatined by employing the HT-based model in SI.",
author = "Zhanyu Ma and Hong Yu and Zheng-Hua Tan and Jun Guo",
year = "2016",
doi = "10.1109/ACCESS.2016.2646458",
language = "English",
volume = "4",
pages = "9733--9739",
journal = "IEEE Access",
issn = "2169-3536",
publisher = "IEEE",

}

Text-Independent Speaker Identification Using the Histogram Transform Model. / Ma, Zhanyu; Yu, Hong; Tan, Zheng-Hua; Guo, Jun.

I: IEEE Access, Bind 4, 7803586, 2016, s. 9733-9739.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - Text-Independent Speaker Identification Using the Histogram Transform Model

AU - Ma, Zhanyu

AU - Yu, Hong

AU - Tan, Zheng-Hua

AU - Guo, Jun

PY - 2016

Y1 - 2016

N2 - In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together. These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes the commonly occurred discontinuity problem in multivariate histograms computing, more training data are generated by the HT method. Using these generated data, a smooth PDF of the super-MFCCs vectors is obtained. Comparing with the typical PDF estimation methods, such as Gaussian mixture model, promising improvements have been obatined by employing the HT-based model in SI.

AB - In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together. These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes the commonly occurred discontinuity problem in multivariate histograms computing, more training data are generated by the HT method. Using these generated data, a smooth PDF of the super-MFCCs vectors is obtained. Comparing with the typical PDF estimation methods, such as Gaussian mixture model, promising improvements have been obatined by employing the HT-based model in SI.

U2 - 10.1109/ACCESS.2016.2646458

DO - 10.1109/ACCESS.2016.2646458

M3 - Journal article

VL - 4

SP - 9733

EP - 9739

JO - IEEE Access

JF - IEEE Access

SN - 2169-3536

M1 - 7803586

ER -