Latent Dirichlet Mixture Model

Jen-Tzung Chien, Chao-Hsi Lee, Zheng-Hua Tan

Research output: Contribution to journalJournal articleResearchpeer-review

4 Citations (Scopus)

Abstract

Text representation based on latent topic model is seen as a non-Gaussian problem where the observed words and latent topics are multinomial variables and the topic proportionals are Dirichlet variables. Traditional topic model is established by introducing a single Dirichlet prior to characterize the topic proportionals. The words in a text document are represented by a random mixture of semantic topics. However, in real world, a single Dirichlet distribution may not faithfully reflect the variations of topic proportionals estimated from the heterogeneous documents. To address these variations, we propose a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model. The resulting latent Dirichlet mixture model (LDMM) is constructed for topic clustering as well as document clustering. Multiple Dirichlets provide a solution to build structural latent variables in learning representation over a variety of topics. This study carries out the inference for LDMM according to the variational Bayes and the collapsed variational Bayes. Such an unsupervised LDMM is further extended to a supervised LDMM for text classification. Experiments on document representation, summarization and classification show the merit of structural prior in LDMM topic models.

Original languageEnglish
JournalNeurocomputing
Volume278
Pages (from-to)12-22
Number of pages11
ISSN0925-2312
DOIs
Publication statusPublished - 2018

Fingerprint

Cluster Analysis
Semantics
Learning

Cite this

Chien, Jen-Tzung ; Lee, Chao-Hsi ; Tan, Zheng-Hua. / Latent Dirichlet Mixture Model. In: Neurocomputing. 2018 ; Vol. 278. pp. 12-22.
@article{d1633bd2086e4e139c1470f7b72c37d4,
title = "Latent Dirichlet Mixture Model",
abstract = "Text representation based on latent topic model is seen as a non-Gaussian problem where the observed words and latent topics are multinomial variables and the topic proportionals are Dirichlet variables. Traditional topic model is established by introducing a single Dirichlet prior to characterize the topic proportionals. The words in a text document are represented by a random mixture of semantic topics. However, in real world, a single Dirichlet distribution may not faithfully reflect the variations of topic proportionals estimated from the heterogeneous documents. To address these variations, we propose a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model. The resulting latent Dirichlet mixture model (LDMM) is constructed for topic clustering as well as document clustering. Multiple Dirichlets provide a solution to build structural latent variables in learning representation over a variety of topics. This study carries out the inference for LDMM according to the variational Bayes and the collapsed variational Bayes. Such an unsupervised LDMM is further extended to a supervised LDMM for text classification. Experiments on document representation, summarization and classification show the merit of structural prior in LDMM topic models.",
author = "Jen-Tzung Chien and Chao-Hsi Lee and Zheng-Hua Tan",
year = "2018",
doi = "10.1016/j.neucom.2017.08.029",
language = "English",
volume = "278",
pages = "12--22",
journal = "Neurocomputing",
issn = "0925-2312",
publisher = "Elsevier",

}

Latent Dirichlet Mixture Model. / Chien, Jen-Tzung; Lee, Chao-Hsi; Tan, Zheng-Hua.

In: Neurocomputing, Vol. 278, 2018, p. 12-22.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Latent Dirichlet Mixture Model

AU - Chien, Jen-Tzung

AU - Lee, Chao-Hsi

AU - Tan, Zheng-Hua

PY - 2018

Y1 - 2018

N2 - Text representation based on latent topic model is seen as a non-Gaussian problem where the observed words and latent topics are multinomial variables and the topic proportionals are Dirichlet variables. Traditional topic model is established by introducing a single Dirichlet prior to characterize the topic proportionals. The words in a text document are represented by a random mixture of semantic topics. However, in real world, a single Dirichlet distribution may not faithfully reflect the variations of topic proportionals estimated from the heterogeneous documents. To address these variations, we propose a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model. The resulting latent Dirichlet mixture model (LDMM) is constructed for topic clustering as well as document clustering. Multiple Dirichlets provide a solution to build structural latent variables in learning representation over a variety of topics. This study carries out the inference for LDMM according to the variational Bayes and the collapsed variational Bayes. Such an unsupervised LDMM is further extended to a supervised LDMM for text classification. Experiments on document representation, summarization and classification show the merit of structural prior in LDMM topic models.

AB - Text representation based on latent topic model is seen as a non-Gaussian problem where the observed words and latent topics are multinomial variables and the topic proportionals are Dirichlet variables. Traditional topic model is established by introducing a single Dirichlet prior to characterize the topic proportionals. The words in a text document are represented by a random mixture of semantic topics. However, in real world, a single Dirichlet distribution may not faithfully reflect the variations of topic proportionals estimated from the heterogeneous documents. To address these variations, we propose a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model. The resulting latent Dirichlet mixture model (LDMM) is constructed for topic clustering as well as document clustering. Multiple Dirichlets provide a solution to build structural latent variables in learning representation over a variety of topics. This study carries out the inference for LDMM according to the variational Bayes and the collapsed variational Bayes. Such an unsupervised LDMM is further extended to a supervised LDMM for text classification. Experiments on document representation, summarization and classification show the merit of structural prior in LDMM topic models.

U2 - 10.1016/j.neucom.2017.08.029

DO - 10.1016/j.neucom.2017.08.029

M3 - Journal article

VL - 278

SP - 12

EP - 22

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

ER -