Reducing Dimensionality in Text Mining using Conjugate Gradients and Hybrid Cholesky Decomposition

Abstract

Generally, data mining in larger datasets consists of certain limitations in identifying the relevant datasets for the given queries. The limitations include: lack of interaction in the required objective space, inability to handle the data sets or discrete variables in datasets, especially in the presence of missing variables and inability to classify the records as per the given query, and finally poor generation of explicit knowledge for a query increases the dimensionality of the data. Hence, this paper aims at resolving the problems with increasing data dimensionality in datasets using modified non-integer matrix factorization (NMF). Further, the increased dimensionality arising due to non-orthogonally of NMF is resolved with Cholesky decomposition (cdNMF). Initially, the structuring of datasets is carried out to form a well-defined geometric structure. Further, the complex conjugate values are extracted and conjugate gradient algorithm is applied to reduce the sparse matrix from the data vector. The cdNMF is used to extract the feature vector from the dataset and the data vector is linearly mapped from upper triangular matrix obtained from the Cholesky decomposition. The experiment is validated against accuracy and normalized mutual information (NMI) metrics over three text databases of varied patterns. Further, the results prove that the proposed technique fits well with larger instances in finding the documents as per the query, than NMF, neighborhood preserving: nonnegative matrix factorization (NPNMF), multiple manifolds non-negative matrix factorization (MMNMF), robust non-negative matrix factorization (RNMF), graph regularized non-negative matrix factorization (GNMF), hierarchical non-negative matrix factorization (HNMF) and cdNMF.

Authors and Affiliations

Jasem M. Alostad

Keywords

Related Articles

Novel LVCSR Decoder Based on Perfect Hash Automata and Tuple Structures – SPREAD –

The paper presents the novel design of a one-pass large vocabulary continuous-speech recognition decoder engine, named SPREAD. The decoder is based on a time-synchronous beam-search approach, including statically expande...

Impact of Different Data Types on Classifier Performance of Random Forest, Naïve Bayes, and K-Nearest Neighbors Algorithms

This study aims to evaluate impact of three different data types (Text only, Numeric Only and Text + Numeric) on classifier performance (Random Forest, k-Nearest Neighbor (kNN) and Naïve Bayes (NB) algorithms). The class...

Metrics for Event Driven Software

The evaluation of Graphical User Interface has significant role to improve its quality. Very few metrics exists for the evaluation of Graphical User Interface. The purpose of metrics is to obtain better measurements in t...

Biometrics Recognition based on Image Local Features Ordinal Encoding

In the present informational era, with the continue extension of embedded computing systems, the demand of faster and robust image descriptors is an important issue. However, image representation and recognition is an op...

ImageCompression Using Real Fourier Transform, Its Wavelet Transform And Hybrid Wavelet With DCT

This paper proposes new image compression technique that uses Real Fourier Transform. Discrete Fourier Transform (DFT) contains complex exponentials. It contains both cosine and sine functions. It gives complex values in...

Download PDF file
  • EP ID EP260200
  • DOI 10.14569/IJACSA.2017.080716
  • Views 63
  • Downloads 0

How To Cite

Jasem M. Alostad (2017). Reducing Dimensionality in Text Mining using Conjugate Gradients and Hybrid Cholesky Decomposition. International Journal of Advanced Computer Science & Applications, 8(7), 110-116. https://europub.co.uk/articles/-A-260200