Review of Various Text Categorization Methods
Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 3
Abstract
Abstract : Measuring the similarity between documents is an important operation in the text processing field.Text categorization (also known as text classification, or topic spotting) is the task of automatically sorting a setof documents into categories from a predefine set [1]. TEXT categorization (TC) is the task of automaticallyclassifying unlabeled natural language documents into a predefined set of semantic categories [2]. The termweighting methods assign appropriate weights to the terms to improve the performance of text categorization[1]. The traditional term weighting methods borrowed from information retrieval(IR), such as binary, termfrequency (tf), tf:idf, and its various variants, belong to the unsupervised term weighting methods as thecalculation of these weighting methods do not make use of the information on the category membership oftraining documents. Generally, the supervised term weighting methods adopt this known information in severalways. Therefore, the fundamental question arise here, “Does the difference between supervised andunsupervised term weighting methods have any relationship with different learning algorithms?”, and if weconsider normalized term frequency instead of term frequency along with relevant frequency the new methodwill be ntf.rf but will this new method is effective for text categorization? So we would like to answer thesequestions by implementing new supervised and unsupervised term weighing method (ntf.rf). The proposed TCmethod will use a number of experiments on two benchmark text collections 20NewsGroups and Reuters.
Authors and Affiliations
Chandrashekhar P. Bhamare , Dinesh D. Patil
Mobility Management Schemes for WMNS Using Pointer Forwarding Techniques
Abstract: The efficient mobility management schemes based on pointer forwarding for wireless mesh networks (WMNs) with the objective to reduce the overall network traffic incurred by mobility management and packet delive...
Distinct Revocable Data Hiding In Ciphered Image
Abstract: This scheme proposes a secure and authenticated reversible data hiding in cipher images.Nowadays, we pay more attention to reversible data hiding in encrypted images, as the original cover can bereversibl...
A Name Entity Detection and Relation Extraction fromUnstructured Data by N-gram Features
Abstract : In recent years Name entity extraction and linking have received much attention. However, correctclassification of entities and proper linking among these entities is a major challenge for researcher. Wepropos...
Displaying All Bangla Compound Letters & Alphabets By 32-Segment
Abstract: Different approaches have been proposed for representing Bangla and English alphabets and numerals by segment display. But there is no complete and accurate scheme has been done yet for Bangla compound...
More General Sophisticated Method of Implementation of Fiber to the Homes
Fiber to the Homes (FTTH) is one of the most important fiber optic applications, since FTTH provides huge bandwidth. The single fiber offering multi services such as :( Data, Voice, Video etc.).Comparing FTTH and c...