Enhanced Approach on Web Page Classification Using Machine Learning Technique
Journal Title: International Journal of Advanced Research in Computer Engineering & Technology(IJARCET) - Year 2012, Vol 1, Issue 7
Abstract
The data set contains WWW-pages collected from computer science departments of various universities in January 1997 by the World Wide Knowledge Base project of the CMU text learning group. The 8,282 pages were manually classified into 7 classes: 1) student, 2) faculty, 3) staff, 4) department, 5) course, 6) project and 7) other. For each class the data set contains pages from the four universities: Cornell, Texas, Washington, Wisconsin and 4,120 miscellaneous pages from other universities. The files are organized into a directory structure, one directory for each class. Each of these seven directories contains 5 subdirectories, one for each of the 4 universities and one for the miscellaneous pages. These directories in turn contain the Web-pages. The proposed work performs the data preprocessing to clean the dataset and transform it in to the pattern for classification. Then the feature extraction is performed for extracting only minimum number of representative features or terms extracted from it without using the entire Web page. After that the classification algorithm is used to classify the dataset into one of the seven classed using FP-Growth algorithm. The proposed approach is compared with the existing system apriori algorithm.
Authors and Affiliations
S. Gowri Shanthi , Dr. Antony Selvadoss Thanamani,
Development and Applications of Line Following Robot Based Health Care Management System
This paper report describes the techniques for analyzing, designing, controlling and improving the health care management system. A line following robot carrying medicine has been designed for providing the medicine to t...
Computer Assisted Testing and Evaluation System: Distance Evaluation Using Mobile Agent Technology
The growth of Internet has led to new avenues for distance education. A crucial factor for the success of distance education is effective mechanisms for distance evaluation (DE). Existing Internet evaluation mechanisms,...
Hippocratic Database- Persisting privacy in e- banking
Preserving the private information in the era of web is one of the most challenging issue. Web services (e-health, ecommerce, e-banking) collect data from users and use them for other purposes. Sometimes data is shar...
Radial Basis Function Neural Networks (RBFNN) For Fire Image Segmentation
A novel method of fire color image segmentation using RBF Neural Networks is proposed. In Radial Basis Function Network (RBFN), Clusters are found automatically using k-means algorithm. Radial basis function is u...
Real-Time Static Devnagri Sign Language Translation using Histogram
Sign language is nowadays widely used in hearing impaired people as communication media. It has different applications in many domains like HCI (Human Computer Interaction), Robot Control, Security, Gaming, Compute...