Enhanced Approach on Web Page Classification Using Machine Learning Technique
Journal Title: International Journal of Advanced Research in Computer Engineering & Technology(IJARCET) - Year 2012, Vol 1, Issue 7
Abstract
The data set contains WWW-pages collected from computer science departments of various universities in January 1997 by the World Wide Knowledge Base project of the CMU text learning group. The 8,282 pages were manually classified into 7 classes: 1) student, 2) faculty, 3) staff, 4) department, 5) course, 6) project and 7) other. For each class the data set contains pages from the four universities: Cornell, Texas, Washington, Wisconsin and 4,120 miscellaneous pages from other universities. The files are organized into a directory structure, one directory for each class. Each of these seven directories contains 5 subdirectories, one for each of the 4 universities and one for the miscellaneous pages. These directories in turn contain the Web-pages. The proposed work performs the data preprocessing to clean the dataset and transform it in to the pattern for classification. Then the feature extraction is performed for extracting only minimum number of representative features or terms extracted from it without using the entire Web page. After that the classification algorithm is used to classify the dataset into one of the seven classed using FP-Growth algorithm. The proposed approach is compared with the existing system apriori algorithm.
Authors and Affiliations
S. Gowri Shanthi , Dr. Antony Selvadoss Thanamani,
Modeling of Fuel Cell Electrical Supply Management System for Onboard Marine Application
This paper presents the design of A dynamic model of the PROTON EXCHANGE FUEL CELL (PEMFC) developed in MATLAB for the marine applications of renewable power generation in on board ships or commercial vessels.. A three p...
ENHANCED TECHNIQUE FOR SECURED AND RELIABLE WATERMARKING USING MFHWT
A general watermarking techniques are used for copy right protection. In this watermarking scheme should achieve the features of robustness and imperceptibility. This paper represents the watermarking algorithm in the DW...
Smart Human Resource Information System
Traditional office administration work is headache for office employee. Smart HRIS will provide user friendly working environment, the system can also be used in LAN that multiple desks operate on same data. Tradit...
Performance Comparison of ACO Algorithms for MANETs
Mobile Ad Hoc Network (MANET) is a dynamic multichip wireless network which is established by a set of mobile nodes on a shared wireless channel. One of the major issues in MANET is routing due to the mobility of the...
Optimized Surveillance Solution for Unattended Baggage Recognition
Abstract—The system automatically recognize activities around protected area to improve safety and security by multiplexing hundreds of video streams in real time. Object tracking method has important role in real tim...