Ensemble Methods for Improving Classification of Data Produced by Latent Dirichlet Allocation

Journal Title: Computer Science and Mathematical Modelling - Year 2018, Vol 0, Issue 8

Abstract

Topic models are very popular methods of text analysis. The most popular algorithm for topic modelling is LDA (Latent Dirichlet Allocation). Recently, many new methods were proposed, that enable the usage of this model in large scale processing. One of the problem is, that a data scientist has to choose the number of topics manually. This step, requires some previous analysis. A few methods were proposed to automatize this step, but none of them works very well if LDA is used as a preprocessing for further classification. In this paper, we propose an ensemble approach which allows us to use more than one model at prediction phase, at the same time, reducing the need of finding a single best number of topics. We have also analyzed a few methods of estimating topic number.<br/><br/>

Authors and Affiliations

Maciej Jankowski

Keywords

Related Articles

Data Warehouse In Knowledge Management System – solution model

In this study has been characterized as the use of selected implementation tools to design a data warehouse (SAS Institute) in system supporting knowledge management. In the first part, these tools are listed and briefly...

Using Montgomery curve arithmetic over F2p for point scalar multiplication on short Weierstrass curve over Fp with exactly one 2-torsion point and order not divisible by 4

Montgomery curves are well known because of their efficiency and side channel attacks vulnerability. In this article it is showed how Montgomery curve arithmetic may be used for point scalar multiplication on short Weier...

Hidden associations in semantic networks

This paper presents main concepts of semantic network applications in facts association allowing for potential crisis situation identification. Described method helps with development of ontology centric tools for terror...

Badanie wydajności wybranych środowisk budowy platformy integracyjnej

W artykule przedstawiono sposób badania wydajności platformy integracyjnej utworzonej przy wykorzystaniu różnych środowisk. Artykuł zawiera opis metryk wydajności platformy integracyjnej. Analizie poddano także zakres na...

Download PDF file
  • EP ID EP519291
  • DOI 10.5604/01.3001.0013.1458
  • Views 125
  • Downloads 0

How To Cite

Maciej Jankowski (2018). Ensemble Methods for Improving Classification of Data Produced by Latent Dirichlet Allocation. Computer Science and Mathematical Modelling, 0(8), 17-28. https://europub.co.uk/articles/-A-519291