Review of Various Text Categorization Methods

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 3

Abstract

 Abstract : Measuring the similarity between documents is an important operation in the text processing field.Text categorization (also known as text classification, or topic spotting) is the task of automatically sorting a setof documents into categories from a predefine set [1]. TEXT categorization (TC) is the task of automaticallyclassifying unlabeled natural language documents into a predefined set of semantic categories [2]. The termweighting methods assign appropriate weights to the terms to improve the performance of text categorization[1]. The traditional term weighting methods borrowed from information retrieval(IR), such as binary, termfrequency (tf), tf:idf, and its various variants, belong to the unsupervised term weighting methods as thecalculation of these weighting methods do not make use of the information on the category membership oftraining documents. Generally, the supervised term weighting methods adopt this known information in severalways. Therefore, the fundamental question arise here, “Does the difference between supervised andunsupervised term weighting methods have any relationship with different learning algorithms?”, and if weconsider normalized term frequency instead of term frequency along with relevant frequency the new methodwill be ntf.rf but will this new method is effective for text categorization? So we would like to answer thesequestions by implementing new supervised and unsupervised term weighing method (ntf.rf). The proposed TCmethod will use a number of experiments on two benchmark text collections 20NewsGroups and Reuters.

Authors and Affiliations

Chandrashekhar P. Bhamare , Dinesh D. Patil

Keywords

Related Articles

 A new approach for user identification in web usage mining preprocessing

 Web usage mining is a subset of data mining. In order to huge amount of data but the data is less appropriates “quantity and quality” of the web data is opposite to each other this is the main problem. Web  ...

An Enhanced Authentication System Using Face and Fingerprint Technologies

Abstract: The primary aim of this paper is to develop an enhanced authentication system using a CascadedLink Feed-Forward Neural Networks. In the end, the system overcomes some limitations of face recognition and fingerp...

 Intelligent Database Driven Reverse Dictionary

 Abstract: The reverse dictionary identifies a concept/idea/definition to words and phrases used to describe that concept. You can enter a single word, phrase, or a few words to get the correct meanings for that sen...

Information and Communication Technology: A Global Tool to Facilitate Teaching and Learning of Business Education in Colleges in Nigeria.

Abstract: Information and Communication Technology (ICTs) occupies a complex position in relations to teaching and learning of Business Education in Colleges. The emergence of ICTs has serious implications on the nature...

 Exploring Different Forms of Trust towards Trusting Intention in Social Media for E-Commerce Purpose

  social media has been described as a platform for sharing information, buying and selling of goods and services through the internet. Thus, it has been described as a place where electronic commerce is been t...

Download PDF file
  • EP ID EP137645
  • DOI -
  • Views 103
  • Downloads 0

How To Cite

Chandrashekhar P. Bhamare, Dinesh D. Patil (2015).  Review of Various Text Categorization Methods. IOSR Journals (IOSR Journal of Computer Engineering), 17(3), 13-19. https://europub.co.uk/articles/-A-137645