Text-Independent Speaker Identification Using the Histogram Transform Model

Zhanyu Ma, Hong Yu, Zheng-Hua Tan, Jun Guo

Research output: Contribution to journalJournal articleResearchpeer-review

15 Citations (Scopus)
130 Downloads (Pure)

Abstract

In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together. These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes the commonly occurred discontinuity problem in multivariate histograms computing, more training data are generated by the HT method. Using these generated data, a smooth PDF of the super-MFCCs vectors is obtained. Comparing with the typical PDF estimation methods, such as Gaussian mixture model, promising improvements have been obatined by employing the HT-based model in SI.
Original languageEnglish
Article number7803586
JournalIEEE Access
Volume4
Pages (from-to)9733-9739
Number of pages6
ISSN2169-3536
DOIs
Publication statusPublished - 2016

Fingerprint

Probability density function
Identification (control systems)
Mathematical transformations
Statistical Models

Cite this

Ma, Zhanyu ; Yu, Hong ; Tan, Zheng-Hua ; Guo, Jun. / Text-Independent Speaker Identification Using the Histogram Transform Model. In: IEEE Access. 2016 ; Vol. 4. pp. 9733-9739.
@article{a7c67af1e0cf48a499870f1fc245bded,
title = "Text-Independent Speaker Identification Using the Histogram Transform Model",
abstract = "In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together. These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes the commonly occurred discontinuity problem in multivariate histograms computing, more training data are generated by the HT method. Using these generated data, a smooth PDF of the super-MFCCs vectors is obtained. Comparing with the typical PDF estimation methods, such as Gaussian mixture model, promising improvements have been obatined by employing the HT-based model in SI.",
author = "Zhanyu Ma and Hong Yu and Zheng-Hua Tan and Jun Guo",
year = "2016",
doi = "10.1109/ACCESS.2016.2646458",
language = "English",
volume = "4",
pages = "9733--9739",
journal = "IEEE Access",
issn = "2169-3536",
publisher = "IEEE",

}

Text-Independent Speaker Identification Using the Histogram Transform Model. / Ma, Zhanyu; Yu, Hong; Tan, Zheng-Hua; Guo, Jun.

In: IEEE Access, Vol. 4, 7803586, 2016, p. 9733-9739.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Text-Independent Speaker Identification Using the Histogram Transform Model

AU - Ma, Zhanyu

AU - Yu, Hong

AU - Tan, Zheng-Hua

AU - Guo, Jun

PY - 2016

Y1 - 2016

N2 - In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together. These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes the commonly occurred discontinuity problem in multivariate histograms computing, more training data are generated by the HT method. Using these generated data, a smooth PDF of the super-MFCCs vectors is obtained. Comparing with the typical PDF estimation methods, such as Gaussian mixture model, promising improvements have been obatined by employing the HT-based model in SI.

AB - In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together. These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes the commonly occurred discontinuity problem in multivariate histograms computing, more training data are generated by the HT method. Using these generated data, a smooth PDF of the super-MFCCs vectors is obtained. Comparing with the typical PDF estimation methods, such as Gaussian mixture model, promising improvements have been obatined by employing the HT-based model in SI.

U2 - 10.1109/ACCESS.2016.2646458

DO - 10.1109/ACCESS.2016.2646458

M3 - Journal article

VL - 4

SP - 9733

EP - 9739

JO - IEEE Access

JF - IEEE Access

SN - 2169-3536

M1 - 7803586

ER -