Latent Dirichlet Mixture Model

Jen-Tzung Chien, Chao-Hsi Lee, Zheng-Hua Tan

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

4 Citationer (Scopus)

Resumé

Text representation based on latent topic model is seen as a non-Gaussian problem where the observed words and latent topics are multinomial variables and the topic proportionals are Dirichlet variables. Traditional topic model is established by introducing a single Dirichlet prior to characterize the topic proportionals. The words in a text document are represented by a random mixture of semantic topics. However, in real world, a single Dirichlet distribution may not faithfully reflect the variations of topic proportionals estimated from the heterogeneous documents. To address these variations, we propose a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model. The resulting latent Dirichlet mixture model (LDMM) is constructed for topic clustering as well as document clustering. Multiple Dirichlets provide a solution to build structural latent variables in learning representation over a variety of topics. This study carries out the inference for LDMM according to the variational Bayes and the collapsed variational Bayes. Such an unsupervised LDMM is further extended to a supervised LDMM for text classification. Experiments on document representation, summarization and classification show the merit of structural prior in LDMM topic models.

OriginalsprogEngelsk
TidsskriftNeurocomputing
Vol/bind278
Sider (fra-til)12-22
Antal sider11
ISSN0925-2312
DOI
StatusUdgivet - 2018

Fingerprint

Cluster Analysis
Semantics
Learning

Citer dette

Chien, Jen-Tzung ; Lee, Chao-Hsi ; Tan, Zheng-Hua. / Latent Dirichlet Mixture Model. I: Neurocomputing. 2018 ; Bind 278. s. 12-22.
@article{d1633bd2086e4e139c1470f7b72c37d4,
title = "Latent Dirichlet Mixture Model",
abstract = "Text representation based on latent topic model is seen as a non-Gaussian problem where the observed words and latent topics are multinomial variables and the topic proportionals are Dirichlet variables. Traditional topic model is established by introducing a single Dirichlet prior to characterize the topic proportionals. The words in a text document are represented by a random mixture of semantic topics. However, in real world, a single Dirichlet distribution may not faithfully reflect the variations of topic proportionals estimated from the heterogeneous documents. To address these variations, we propose a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model. The resulting latent Dirichlet mixture model (LDMM) is constructed for topic clustering as well as document clustering. Multiple Dirichlets provide a solution to build structural latent variables in learning representation over a variety of topics. This study carries out the inference for LDMM according to the variational Bayes and the collapsed variational Bayes. Such an unsupervised LDMM is further extended to a supervised LDMM for text classification. Experiments on document representation, summarization and classification show the merit of structural prior in LDMM topic models.",
author = "Jen-Tzung Chien and Chao-Hsi Lee and Zheng-Hua Tan",
year = "2018",
doi = "10.1016/j.neucom.2017.08.029",
language = "English",
volume = "278",
pages = "12--22",
journal = "Neurocomputing",
issn = "0925-2312",
publisher = "Elsevier",

}

Latent Dirichlet Mixture Model. / Chien, Jen-Tzung; Lee, Chao-Hsi; Tan, Zheng-Hua.

I: Neurocomputing, Bind 278, 2018, s. 12-22.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - Latent Dirichlet Mixture Model

AU - Chien, Jen-Tzung

AU - Lee, Chao-Hsi

AU - Tan, Zheng-Hua

PY - 2018

Y1 - 2018

N2 - Text representation based on latent topic model is seen as a non-Gaussian problem where the observed words and latent topics are multinomial variables and the topic proportionals are Dirichlet variables. Traditional topic model is established by introducing a single Dirichlet prior to characterize the topic proportionals. The words in a text document are represented by a random mixture of semantic topics. However, in real world, a single Dirichlet distribution may not faithfully reflect the variations of topic proportionals estimated from the heterogeneous documents. To address these variations, we propose a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model. The resulting latent Dirichlet mixture model (LDMM) is constructed for topic clustering as well as document clustering. Multiple Dirichlets provide a solution to build structural latent variables in learning representation over a variety of topics. This study carries out the inference for LDMM according to the variational Bayes and the collapsed variational Bayes. Such an unsupervised LDMM is further extended to a supervised LDMM for text classification. Experiments on document representation, summarization and classification show the merit of structural prior in LDMM topic models.

AB - Text representation based on latent topic model is seen as a non-Gaussian problem where the observed words and latent topics are multinomial variables and the topic proportionals are Dirichlet variables. Traditional topic model is established by introducing a single Dirichlet prior to characterize the topic proportionals. The words in a text document are represented by a random mixture of semantic topics. However, in real world, a single Dirichlet distribution may not faithfully reflect the variations of topic proportionals estimated from the heterogeneous documents. To address these variations, we propose a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model. The resulting latent Dirichlet mixture model (LDMM) is constructed for topic clustering as well as document clustering. Multiple Dirichlets provide a solution to build structural latent variables in learning representation over a variety of topics. This study carries out the inference for LDMM according to the variational Bayes and the collapsed variational Bayes. Such an unsupervised LDMM is further extended to a supervised LDMM for text classification. Experiments on document representation, summarization and classification show the merit of structural prior in LDMM topic models.

U2 - 10.1016/j.neucom.2017.08.029

DO - 10.1016/j.neucom.2017.08.029

M3 - Journal article

VL - 278

SP - 12

EP - 22

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

ER -