Review of Various Text Categorization Methods

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 3

Abstract

 Abstract : Measuring the similarity between documents is an important operation in the text processing field.Text categorization (also known as text classification, or topic spotting) is the task of automatically sorting a setof documents into categories from a predefine set [1]. TEXT categorization (TC) is the task of automaticallyclassifying unlabeled natural language documents into a predefined set of semantic categories [2]. The termweighting methods assign appropriate weights to the terms to improve the performance of text categorization[1]. The traditional term weighting methods borrowed from information retrieval(IR), such as binary, termfrequency (tf), tf:idf, and its various variants, belong to the unsupervised term weighting methods as thecalculation of these weighting methods do not make use of the information on the category membership oftraining documents. Generally, the supervised term weighting methods adopt this known information in severalways. Therefore, the fundamental question arise here, “Does the difference between supervised andunsupervised term weighting methods have any relationship with different learning algorithms?”, and if weconsider normalized term frequency instead of term frequency along with relevant frequency the new methodwill be ntf.rf but will this new method is effective for text categorization? So we would like to answer thesequestions by implementing new supervised and unsupervised term weighing method (ntf.rf). The proposed TCmethod will use a number of experiments on two benchmark text collections 20NewsGroups and Reuters.

Authors and Affiliations

Chandrashekhar P. Bhamare , Dinesh D. Patil

Keywords

Related Articles

Performance Measurement of WLAN Based On Medium Access Control for Wirelessly Connected Stations

Abstract: This paper is mainly focuses on the Medium Access Control (MAC) sublayer of the IEEE 802.11 standard for Wireless Local Area Network (WLAN) and delay measurement among the network and also compare of the traffi...

 Route maintenance and Scalability improvement of DSR, based on Relay node identification after locating Link-failure over MANET

 Abstract: In Dynamic Source Routing, each source determines the route to be used in transmitting its packets to destination. Route Discovery determines the optimum path for a transmission between a given source and...

 Video Surveillance for Effective Object Detection with Alarm Triggering

 Abstract: This paper presents a novel algorithm for detection and segmentation of foreground objects from a video which contains both stationary and moving background objects and under- goes both gradual and sudden...

Combating against Byzantine Attacks in MANET using Enhanced Cooperative Bait Detection Scheme (ECBDS)

Abstract: Mobile Ad-hoc(MANET) is an accumulation of versatile, decentralized, and self composed nodes. The distributive nature, base less & element structure make it a simple prey to security related dangers. Thesec...

Survey of Crop Prediction Using Different Classification Analytical Model

In this paper, there are different ways has been analyzed to predict the crop yield. Now a day, there's tremendous growth in paper publication and analysis in numerous streams for getting efficient result. The crop yield...

Download PDF file
  • EP ID EP137645
  • DOI -
  • Views 102
  • Downloads 0

How To Cite

Chandrashekhar P. Bhamare, Dinesh D. Patil (2015).  Review of Various Text Categorization Methods. IOSR Journals (IOSR Journal of Computer Engineering), 17(3), 13-19. https://europub.co.uk/articles/-A-137645