Text-Independent Speaker Identification Using the Histogram Transform Model

Zhanyu Ma; Hong Yu; Zheng-Hua Tan; Jun Guo

doi:10.1109/ACCESS.2016.2646458

Text-Independent Speaker Identification Using the Histogram Transform Model

Zhanyu Ma, Hong Yu, Zheng-Hua Tan, Jun Guo

Institut for Elektroniske Systemer

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

24 Citationer (Scopus)

322 Downloads (Pure)

Abstract

In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together. These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes the commonly occurred discontinuity problem in multivariate histograms computing, more training data are generated by the HT method. Using these generated data, a smooth PDF of the super-MFCCs vectors is obtained. Comparing with the typical PDF estimation methods, such as Gaussian mixture model, promising improvements have been obatined by employing the HT-based model in SI.

Originalsprog	Engelsk
Artikelnummer	7803586
Tidsskrift	IEEE Access
Vol/bind	4
Sider (fra-til)	9733-9739
Antal sider	6
ISSN	2169-3536
DOI	https://doi.org/10.1109/ACCESS.2016.2646458
Status	Udgivet - 2016

Adgang til dokumentet

10.1109/ACCESS.2016.2646458

Open access articleForlagets udgivne version, 2,48 MB

http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7803586

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Citationsformater

@article{a7c67af1e0cf48a499870f1fc245bded,

title = "Text-Independent Speaker Identification Using the Histogram Transform Model",

abstract = "In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together. These super-MFCC vectors are utilized for probabilistic model training such that the speaker{\textquoteright}s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes the commonly occurred discontinuity problem in multivariate histograms computing, more training data are generated by the HT method. Using these generated data, a smooth PDF of the super-MFCCs vectors is obtained. Comparing with the typical PDF estimation methods, such as Gaussian mixture model, promising improvements have been obatined by employing the HT-based model in SI.",

author = "Zhanyu Ma and Hong Yu and Zheng-Hua Tan and Jun Guo",

year = "2016",

doi = "10.1109/ACCESS.2016.2646458",

language = "English",

volume = "4",

pages = "9733--9739",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "IEEE",

}

TY - JOUR

T1 - Text-Independent Speaker Identification Using the Histogram Transform Model

AU - Ma, Zhanyu

AU - Yu, Hong

AU - Tan, Zheng-Hua

AU - Guo, Jun

PY - 2016

Y1 - 2016

N2 - In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together. These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes the commonly occurred discontinuity problem in multivariate histograms computing, more training data are generated by the HT method. Using these generated data, a smooth PDF of the super-MFCCs vectors is obtained. Comparing with the typical PDF estimation methods, such as Gaussian mixture model, promising improvements have been obatined by employing the HT-based model in SI.

AB - In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design a super-MFCCs features by cascading three neighboring Mel-frequency Cepstral coefficients (MFCCs) frames together. These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recedes the commonly occurred discontinuity problem in multivariate histograms computing, more training data are generated by the HT method. Using these generated data, a smooth PDF of the super-MFCCs vectors is obtained. Comparing with the typical PDF estimation methods, such as Gaussian mixture model, promising improvements have been obatined by employing the HT-based model in SI.

U2 - 10.1109/ACCESS.2016.2646458

DO - 10.1109/ACCESS.2016.2646458

M3 - Journal article

SN - 2169-3536

VL - 4

SP - 9733

EP - 9739

JO - IEEE Access

JF - IEEE Access

M1 - 7803586

ER -

Text-Independent Speaker Identification Using the Histogram Transform Model

Abstract

Adgang til dokumentet

AUB Link

Fingeraftryk

Citationsformater