Review of Various Text Categorization Methods

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 3

Abstract

 Abstract : Measuring the similarity between documents is an important operation in the text processing field.Text categorization (also known as text classification, or topic spotting) is the task of automatically sorting a setof documents into categories from a predefine set [1]. TEXT categorization (TC) is the task of automaticallyclassifying unlabeled natural language documents into a predefined set of semantic categories [2]. The termweighting methods assign appropriate weights to the terms to improve the performance of text categorization[1]. The traditional term weighting methods borrowed from information retrieval(IR), such as binary, termfrequency (tf), tf:idf, and its various variants, belong to the unsupervised term weighting methods as thecalculation of these weighting methods do not make use of the information on the category membership oftraining documents. Generally, the supervised term weighting methods adopt this known information in severalways. Therefore, the fundamental question arise here, “Does the difference between supervised andunsupervised term weighting methods have any relationship with different learning algorithms?”, and if weconsider normalized term frequency instead of term frequency along with relevant frequency the new methodwill be ntf.rf but will this new method is effective for text categorization? So we would like to answer thesequestions by implementing new supervised and unsupervised term weighing method (ntf.rf). The proposed TCmethod will use a number of experiments on two benchmark text collections 20NewsGroups and Reuters.

Authors and Affiliations

Chandrashekhar P. Bhamare , Dinesh D. Patil

Keywords

Related Articles

Detection and Prevention of Selfish Attack in MANET using Dynamic Learning

Abstract: In this paper we deal with misbehaving nodes in mobile ad hoc networks (MANETs) that drop packets supposed to be relayed, whose purpose may be either saving their resources or launching a DoS attack. We propose...

Direction-Length Code (DLC) To Represent Binary Objects

 Abstract: More and more images have been generated in digital form around the world. Efficient way of description and classification of objects is a well needed application to identify the objects present in images...

A Structured and Layered Approach for a Modular Electronic Voting System: Defining the Security Service and the Network Access Layers

Abstract: This is the second part of the series of works that attempt to solve the problem of non-modularity in electronic voting systems. The work analyzed the second layer (Security Service layer) of our proposed struc...

 Security Issues and Privacy in Cloud Computing

Abstract: Recent advances have given rise to the popularity and success of cloud computing. However, when outsourcing the data and business application to a third party causes the security and privacy issues to become a...

 Intrusion Detection Techniques In Mobile Networks

 The rapid proliferation of wireless networks and mobile computing applications has changed the landscape of network security. The recent denial of service attacks on major Internet sites have shown us, no  o...

Download PDF file
  • EP ID EP137645
  • DOI -
  • Views 120
  • Downloads 0

How To Cite

Chandrashekhar P. Bhamare, Dinesh D. Patil (2015).  Review of Various Text Categorization Methods. IOSR Journals (IOSR Journal of Computer Engineering), 17(3), 13-19. https://europub.co.uk/articles/-A-137645