Techniques for text classification: Literature review and current trends
Journal Title: Webology - Year 2015, Vol 12, Issue 2
Abstract
Automated classification of text into predefined categories has always been considered as a vital method to manage and process a vast amount of documents in digital forms that are widespread and continuously increasing. This kind of web information, popularly known as the digital/electronic information is in the form of documents, conference material, publications, journals, editorials, web pages, e-mail etc. People largely access information from these online sources rather than being limited to archaic paper sources like books, magazines, newspapers etc. But the main problem is that this enormous information lacks organization which makes it difficult to manage. Text classification is recognized as one of the key techniques used for organizing such kind of digital data. In this paper we have studied the existing work in the area of text classification which will allow us to have a fair evaluation of the progress made in this field till date. We have investigated the papers to the best of our knowledge and have tried to summarize all existing information in a comprehensive and succinct manner. The studies have been summarized in a tabular form according to the publication year considering numerous key perspectives. The main emphasis is laid on various steps involved in text classification process viz. document representation methods, feature selection methods, data mining methods and the evaluation technique used by each study to carry out the results on a particular dataset.
Authors and Affiliations
Rajni Jindal, Ruchika Malhotra and Abha Jain
The impact of electronic word-of-mouth in the distribution of digital goods
The rapid proliferation of social media networks has presented a platform of opportunities for the distribution of digital products and related applications. This is commonly known as word-of-mouth or viral marketing and...
Information Ecology of Bioinformatic in Web of Science with Emphasizing on Articles Thematic Interaction
The present study aims to represent the infoecology of bioinformatics, with an emphasis on the topic of relationships between studies in order to provide a scientific framework for infoecological investigations in this f...
Scientometric Analysis of Scientific Publications on Persian Medicine Indexed in the Web of Science Database
Persian medicine (PM), also known as Iranian traditional medicine, is a collection of ancient experience, knowledge and skills that has been long practiced by Iranian experts for prevention, diagnosis and treatment of di...
A metatheory integrating social, biological and technological factors in information behavior research
A metatheory is presented and diagrammed as an integrated conceptual framework for information seeking and use. It represents the symbiotic relationship between users and the technological environment. Receiving and ad...
Descriptor and Folksonomy Concurrence in Education Related Scholarly Research
Folksonomies are a decentralized yet collaborative form of classification based on user-defined keywords (also known as tags). Although this uncontrolled method of classification lacks rules for term standardization and...