Review of Various Text Categorization Methods
Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 3
Abstract
Abstract : Measuring the similarity between documents is an important operation in the text processing field.Text categorization (also known as text classification, or topic spotting) is the task of automatically sorting a setof documents into categories from a predefine set [1]. TEXT categorization (TC) is the task of automaticallyclassifying unlabeled natural language documents into a predefined set of semantic categories [2]. The termweighting methods assign appropriate weights to the terms to improve the performance of text categorization[1]. The traditional term weighting methods borrowed from information retrieval(IR), such as binary, termfrequency (tf), tf:idf, and its various variants, belong to the unsupervised term weighting methods as thecalculation of these weighting methods do not make use of the information on the category membership oftraining documents. Generally, the supervised term weighting methods adopt this known information in severalways. Therefore, the fundamental question arise here, “Does the difference between supervised andunsupervised term weighting methods have any relationship with different learning algorithms?”, and if weconsider normalized term frequency instead of term frequency along with relevant frequency the new methodwill be ntf.rf but will this new method is effective for text categorization? So we would like to answer thesequestions by implementing new supervised and unsupervised term weighing method (ntf.rf). The proposed TCmethod will use a number of experiments on two benchmark text collections 20NewsGroups and Reuters.
Authors and Affiliations
Chandrashekhar P. Bhamare , Dinesh D. Patil
Customized Ontology model for Web Information Gathering using Clustering
Abstract: The explosion of data leads to the problem on how information should be retrieved accurately and effectively. To address this issue, ontology’s are widely used to represent user profiles in personalized w...
Payment of Bus Fare Using Handy Card
Abstract: There are abundant people in day to day life travelling in either government bus or private bus. The foremost delinquent that the user faces is that they don’t get the remaining amount from the conductor that t...
Improved Product Ranking For Recommendation Systems
Abstract: In the new area of information technology, the most important and recognition machine learning technique is Recommender engine. Vast numbers of knowledge discovery techniques are applied by recommender system t...
‘A Review Study on Future Applicability of Snake Robots in India’
Abstract: In this study we to aim to present an overview of features of snake robots and their application across various fields. Snakes is blessed with a unique feature of moving over or climbing all most all kind of te...
E-Tailing: The Shifting Visage of Retail Business in India
Abstract: E- Tailing has taken the business world by storm and mesmerized the psyche of the entire thumb generation entrepreneurs with a collection of viable business and commercial models. The indispensable motivating f...