Comparison of Performance in Text Mining Using Text Categorization of Semi Structured Data

Abstract

Text mining or knowledge discovery is that sub process of data mining, which is widely being used to discover hidden patterns and significant information from the huge amount of unstructured data. The enormous amount of information stored in unstructured / semi structured data cannot simply be used for further processing by computers, which typically handle text as simple sequences of character strings. Therefore, specific pre-processing methods and algorithms are required in order to extract useful patterns. In this study, we compared the performance of these classifications by applying the method of Bayesian methods, k-NN, decision trees, SVM, and as a neural network in classification on famous 20_newsgroup dataset from CMU Text Learning Group Data Archives, which has a collection of 20,000 messages, collected from 20 different net news newsgroups. The news will be classified according to their contents.

Authors and Affiliations

M. Nandhiya, Ms. M. Sakthi

Keywords

Related Articles

Design and Implementation of Integrated Mobile Operated Remote Vehicle

This paper presents the technical construction of the vehicle controlled by user mobile. The designed GSM based vehicle could be operated from almost anywhere if GSM network exists. The procedure commences with initiati...

Drainage Evaluation of Doddhalla Sub-basin, Karnataka, India Using GIS

Morphometric analysis of Doddhalla Sub-basin, which is a tributary of River Malaprabha, Karnataka, India was carried out to study its drainage network and characterictics. Sub-basin is spread over an area of 293 km2 wit...

Analysis of CNR Penalty in Radio over Fiber System Including the Effect of Phase Noise & RF Oscillator

To analysis of the system performance based on photocurrent at the photo detector. A carrier to noise i.e. CNR is used, for the evaluation of the performance of the Radio over fiber system, because it is a good & simple...

A New Approach to Provide Security to Audio Information Using Cryptography & Steganography

Information in any form is a very important resource for any organization or individual person. Due to research and new technologies it is possible to store and exchange information in different formats. Information can...

Energy Efficient Clustered Routing Protocols of LEACH

Sensor node has a limited amount of battery in sensor network. To prolong the overall lifetime of the network, development of energy efficient routing protocol is a major issue in Wireless Sensor Network. Clustering pro...

Download PDF file
  • EP ID EP22659
  • DOI -
  • Views 298
  • Downloads 5

How To Cite

M. Nandhiya, Ms. M. Sakthi (2016). Comparison of Performance in Text Mining Using Text Categorization of Semi Structured Data. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 4(9), -. https://europub.co.uk/articles/-A-22659