In 2014, we launched our open-access repository which offers full text access to conference proceedings from many of our events including the INC and HAISA series. These papers are free to access and distribute (subject to citing the source).
Ninth International Network Conference (INC 2012)
Title: Wikipedia-Based Efficient Sampling Approach for Topic Model
Author(s): Tong Zhao, Chunping Li, Mengya Li
Keywords: Gibbs sampling, Latent Dirichlet Allocation, Wikipedia, Topic Model
Abstract: In this paper, we propose a novel approach called Wikipedia-based Collapsed Gibbs sampling (Wikipedia-based CGS) to improve the efficiency of the collapsed Gibbs sampling(CGS), which has been widely used in latent Dirichlet Allocation (LDA) model. Conventional CGS method views each word in the documents as an equal status for the topic modeling. Moreover, sampling all the words in the documents always leads to high computational complexity. Considering this crucial drawback of LDA we propose the Wikipedia-based CGS approach that commits to extracting more meaningful topics and improving the efficiency of the sampling process in LDA by distinguishing different statuses of words in the documents for sampling topics with Wikipedia as the background knowledge. The experiments on real world datasets show that our Wikipedia-based approach for collapsed Gibbs sampling can significantly improve the efficiency and have a better perplexity compared to existing approaches.
Download count: 1026
How to get this paper:
PDF copy of this paper is free to download. You may distribute this copy providing you cite this page as the source.