Ensemble Methods for Improving Classification of Data Produced by Latent Dirichlet Allocation
Journal Title: Computer Science and Mathematical Modelling - Year 2018, Vol 0, Issue 8
Abstract
Topic models are very popular methods of text analysis. The most popular algorithm for topic modelling is LDA (Latent Dirichlet Allocation). Recently, many new methods were proposed, that enable the usage of this model in large scale processing. One of the problem is, that a data scientist has to choose the number of topics manually. This step, requires some previous analysis. A few methods were proposed to automatize this step, but none of them works very well if LDA is used as a preprocessing for further classification. In this paper, we propose an ensemble approach which allows us to use more than one model at prediction phase, at the same time, reducing the need of finding a single best number of topics. We have also analyzed a few methods of estimating topic number.<br/><br/>
Authors and Affiliations
Maciej Jankowski
Data Warehouse In Knowledge Management System – solution model
In this study has been characterized as the use of selected implementation tools to design a data warehouse (SAS Institute) in system supporting knowledge management. In the first part, these tools are listed and briefly...
Visual analysis techniques for medical diagnosis support
Using Montgomery curve arithmetic over F2p for point scalar multiplication on short Weierstrass curve over Fp with exactly one 2-torsion point and order not divisible by 4
Montgomery curves are well known because of their efficiency and side channel attacks vulnerability. In this article it is showed how Montgomery curve arithmetic may be used for point scalar multiplication on short Weier...
Hidden associations in semantic networks
This paper presents main concepts of semantic network applications in facts association allowing for potential crisis situation identification. Described method helps with development of ontology centric tools for terror...
Badanie wydajności wybranych środowisk budowy platformy integracyjnej
W artykule przedstawiono sposób badania wydajności platformy integracyjnej utworzonej przy wykorzystaniu różnych środowisk. Artykuł zawiera opis metryk wydajności platformy integracyjnej. Analizie poddano także zakres na...