Techniques for text classification: Literature review and current trends
Journal Title: Webology - Year 2015, Vol 12, Issue 2
Abstract
Automated classification of text into predefined categories has always been considered as a vital method to manage and process a vast amount of documents in digital forms that are widespread and continuously increasing. This kind of web information, popularly known as the digital/electronic information is in the form of documents, conference material, publications, journals, editorials, web pages, e-mail etc. People largely access information from these online sources rather than being limited to archaic paper sources like books, magazines, newspapers etc. But the main problem is that this enormous information lacks organization which makes it difficult to manage. Text classification is recognized as one of the key techniques used for organizing such kind of digital data. In this paper we have studied the existing work in the area of text classification which will allow us to have a fair evaluation of the progress made in this field till date. We have investigated the papers to the best of our knowledge and have tried to summarize all existing information in a comprehensive and succinct manner. The studies have been summarized in a tabular form according to the publication year considering numerous key perspectives. The main emphasis is laid on various steps involved in text classification process viz. document representation methods, feature selection methods, data mining methods and the evaluation technique used by each study to carry out the results on a particular dataset.
Authors and Affiliations
Rajni Jindal, Ruchika Malhotra and Abha Jain
How Do Search Engines Handle Chinese Queries?
The use of languages other than English has been growing exponentially on the Web. However, the major search engines have been lagging behind in providing indexes and search features to handle these languages. This artic...
Identification of the characteristics of e-commerce websites
E-commerce websites must possess certain characteristics in order to attract customers/users. Although previous studies have been conducted to determine some of these characteristics of different categories of websites,...
Obstacles to the Utilization of Institutional Repositories by Academics in Higher Education in Nigeria
The purpose of this study was to identify the major barriers to the use of institutional repositories by academics in universities of Nigeria with a view to recommending ways of enhancing the utilization of institutional...
High school students' perspective on the features of consumer health information websites
The main aim of study was to identify the primary source of health information seeking among high school students and the characteristics of quality consumer health information from their perspective. A cross sectional d...
Information Searching Habits of Internet Users: A Case Study on the Medical Sciences University of Isfahan, Iran
This article reports a survey on the search habits of Internet users at the Medical University of Isfahan (MUI), a governmental university in Isfahan city, Iran. Efforts are on to find the search requirements related to...