Review of Various Text Categorization Methods
Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 3
Abstract
Abstract : Measuring the similarity between documents is an important operation in the text processing field.Text categorization (also known as text classification, or topic spotting) is the task of automatically sorting a setof documents into categories from a predefine set [1]. TEXT categorization (TC) is the task of automaticallyclassifying unlabeled natural language documents into a predefined set of semantic categories [2]. The termweighting methods assign appropriate weights to the terms to improve the performance of text categorization[1]. The traditional term weighting methods borrowed from information retrieval(IR), such as binary, termfrequency (tf), tf:idf, and its various variants, belong to the unsupervised term weighting methods as thecalculation of these weighting methods do not make use of the information on the category membership oftraining documents. Generally, the supervised term weighting methods adopt this known information in severalways. Therefore, the fundamental question arise here, “Does the difference between supervised andunsupervised term weighting methods have any relationship with different learning algorithms?”, and if weconsider normalized term frequency instead of term frequency along with relevant frequency the new methodwill be ntf.rf but will this new method is effective for text categorization? So we would like to answer thesequestions by implementing new supervised and unsupervised term weighing method (ntf.rf). The proposed TCmethod will use a number of experiments on two benchmark text collections 20NewsGroups and Reuters.
Authors and Affiliations
Chandrashekhar P. Bhamare , Dinesh D. Patil
Hospital Inpatient Caring By Markov Decision Process
Many challenges have been faced by the health care system involving high rates of drug-resistant and hospital-acquired disease, failures of care delivery leading to preventable adverse health events and skyrocketing cost...
Key Policy Attribute Based Encryption in Cloud Storage
Abstract: Cloud Computing is the rapid growing technology and enables highly scalable services to be easily consumed over the Internet on an as-needed basis. It is a kind of Internet-based computing that provides shared...
Self-Optimized Multihop Routing Protocol founded at Wireless Sensor Networks Cross Layer Architecture
Nowadays, wireless sensor networks (WSNs) are evolving growth wise beneficial,worthwhile and achallenging learning streams. The advancements in WSN endow a broad variety of ecological monitoring and object tracking appli...
An Analysis of students’ performance using classification algorithms
In recent years, the analysis and evaluation of students‟ performance and retaining the standard of education is a very important problem in all the educational institutions. The most important goal of the paper...
Sentiment Analysis of English and Tamil Tweets using Path Length Similarity based Word Sense Disambiguation
Abstract: In social media, users have the privilege of connecting with people and extensively communicate, share information, discuss topics of recent trends. Friendster, LinkedIn, Instagram, Twitter are some media throu...