Reducing Dimensionality in Text Mining using Conjugate Gradients and Hybrid Cholesky Decomposition

Abstract

Generally, data mining in larger datasets consists of certain limitations in identifying the relevant datasets for the given queries. The limitations include: lack of interaction in the required objective space, inability to handle the data sets or discrete variables in datasets, especially in the presence of missing variables and inability to classify the records as per the given query, and finally poor generation of explicit knowledge for a query increases the dimensionality of the data. Hence, this paper aims at resolving the problems with increasing data dimensionality in datasets using modified non-integer matrix factorization (NMF). Further, the increased dimensionality arising due to non-orthogonally of NMF is resolved with Cholesky decomposition (cdNMF). Initially, the structuring of datasets is carried out to form a well-defined geometric structure. Further, the complex conjugate values are extracted and conjugate gradient algorithm is applied to reduce the sparse matrix from the data vector. The cdNMF is used to extract the feature vector from the dataset and the data vector is linearly mapped from upper triangular matrix obtained from the Cholesky decomposition. The experiment is validated against accuracy and normalized mutual information (NMI) metrics over three text databases of varied patterns. Further, the results prove that the proposed technique fits well with larger instances in finding the documents as per the query, than NMF, neighborhood preserving: nonnegative matrix factorization (NPNMF), multiple manifolds non-negative matrix factorization (MMNMF), robust non-negative matrix factorization (RNMF), graph regularized non-negative matrix factorization (GNMF), hierarchical non-negative matrix factorization (HNMF) and cdNMF.

Authors and Affiliations

Jasem M. Alostad

Keywords

Related Articles

ENHANCED LINK REDIRECTION INTERFACE FOR SECURED BROWSING USING WEB BROWSER EXTENSIONS

In the present world scenario where data is meant to be protected from intruders and crackers, everyone has the fear to keep their private data safe. As the data is stored on servers accessed through websites by browsers...

A Hybrid Exam Scheduling Technique based on Graph Coloring and Genetic Algorithms Targeted towards Student Comfort

Scheduling is one of the vital activities needed in various aspects of life. It is also a key factor in generating exam schedules for academic institutions. In this paper we propose an exam scheduling technique that comb...

An Adaptive Heart Disease Behavior-Based Prediction System

Heart disease prediction is a complex process that is influenced by several factors, including the combination of attributes leading to the possibility of heart disease and availability of these attributes in the databas...

Smart Card Based Integrated Electronic Health Record System For Clinical Practice

Smart cards are used in information technologies as portable integrated devices with data storage and data processing capabilities. As in other fields, smart card use in health systems became popular due to their increas...

Understanding a Co-Evolution Model of Business and IT for Dynamic Business Process Requirements

Organizations adapt existing business processes in order to become competitive but a change in a process affects other processes as well. In order to support the required change suitable technologies must be provided so...

Download PDF file
  • EP ID EP260200
  • DOI 10.14569/IJACSA.2017.080716
  • Views 97
  • Downloads 0

How To Cite

Jasem M. Alostad (2017). Reducing Dimensionality in Text Mining using Conjugate Gradients and Hybrid Cholesky Decomposition. International Journal of Advanced Computer Science & Applications, 8(7), 110-116. https://europub.co.uk/articles/-A-260200