Frequent Itemset-based Text Clustering Approach to Cluster Ranked Documents

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 4

Abstract

 Abstract: In most of the search engines documents are retrieved and ranked on the basis of relevance. They are not necessarily ranked on the basis of similarity between the query and the respective document. The ranked documents are in the form of a list. So there is a need to rank the retrieved documents. We implement Language model that is efficient to retrieve the text documents satisfying given query. The ranking of documents is based on the similarity coefficient calculated using language model. To cluster the ranked retrieved documents, we use FITC (Frequent Item set-based text Clustering) algorithm. The algorithm partitions the documents and returns the cluster in each partition. The algorithm identifies clusters with no overlap. The system accepts user request and returns all the relevant documents partitioned in the form of clusters which satisfy the query. The results of applying the algorithm to document retrieval demonstrate that the algorithm identifies non-overlapping clusters and is therefore of widespread use in many of the search engines.

Authors and Affiliations

Snehalata Nandanwar , Geetanjali Kale , Sheetal Sonawane

Keywords

Related Articles

 Fuzzy Logic based Individual Crop Advisory System based on Weather Input Data

Abstract: Weather has the most significant influence on agriculture. Various weather phenomenon such as cloudiness, precipitation, temperature, and wind have significant influence on agri-management decisions,management...

 Elaborating the performance of Sensor Networks with merest energy to prolong network endurance

 The routing process in Wireless Sensor Networks may differ from other normal routing methods because it contains no link to follow and no definite structure. The Wireless Sensor Networks is created by densely de...

 New Dynamical Key Dependent S-Box based on chaotic maps

Abstract: The strength and security of cryptographic algorithms is determined by substitution non-linear Sboxes, so the construction of cryptographically strong S-boxes is important in the design of secure cryptosystems....

Optimizing Migration of the Application Data in Cloud Environment Using ACO Algorithm and RSA Encryption

Abstract: Taking the advantages of the capabilities offered by cloud computing requires either an application to be built especially for it, or for existing application to migrated to it. The main focus on migrate the ap...

 Comparison Of Performance In Image Restoration By Time And Frequency Domain Techniques

Image Processing is a practice of signal processing aimed at which the input is an image, for instance a photograph or video frame. Also the output of image processing might be either an image or some set of characterist...

Download PDF file
  • EP ID EP131843
  • DOI 10.9790/0661-16456672
  • Views 125
  • Downloads 0

How To Cite

Snehalata Nandanwar, Geetanjali Kale, Sheetal Sonawane (2014).  Frequent Itemset-based Text Clustering Approach to Cluster Ranked Documents. IOSR Journals (IOSR Journal of Computer Engineering), 16(4), 66-72. https://europub.co.uk/articles/-A-131843