An Effective Approach for Web Document Classification using FP-Growth and Naïve Bayes Techniques
Journal Title: International Journal of Computer Science & Engineering Technology - Year 2012, Vol 3, Issue 10
Abstract
Exponential growth of the web increased the importance of web documents classification and data mining. To get the exact information, in the form of knowing what classes a web document belongs to, is expensive. Automatic classification of web documents is of great use to search engines which provides this information at a low cost. In this paper, we propose an approach for classifying the web documents using the frequent item word sets generated by the Frequent Pattern(FP) Growth technique. These set of associated words act as feature set. The final classification obtained after Naïve Bayes classifier used on the feature set. For the experimental work, we use Gensim package, as it is simple and robust. Results show that our approach can be effectively classifying the web documents.
Authors and Affiliations
Rajendra Kumar Roul , Dr. Sanjay Kumar Sahay
Reduce Total Distance and Time Using Genetic Algorithm in Traveling Salesman Problem
Traveling salesman problem is quite known in the field of combinatorial optimization. Through this research describe how the traveling salesman problem is solved by the heuristic method of genetic algorithms. This resear...
Ant Colony Optimization Algorithm Based Vehicle Theft Prediction- revention and Recovery System Model (Aco-Vtp2rsm)
Existing vehicle security technologies are either capable of theft, prevention or recovery or both. They lack the capability to predict theft occurrence and this makes the task of theft prevention or recovery unattainabl...
A Survey on Under Water Images Enhancement Techniques
The Major causes for the underwater images are light scattering and color change. One of the methods of improving quality of the image is image enhancement. This paper presents a comparative study of various image enhanc...
Feedback Routing Algorithm in optical WDM Networks
This study is mainly concentrate on the routing problem in optical WDM network. In WDM network, the wavelength continuity constrain must be taken care in data communication. Lightpath is the communication channel between...
Predicting Students Attrition using Data Mining
Student attrition has become one of the most important measures of success for higher education institutions. It is an important issue for all institutions due to the potential negative impact on the image of the univers...