Enhanced Approach on Web Page Classification Using Machine Learning Technique  

Abstract

The data set contains WWW-pages collected from computer science departments of various universities in January 1997 by the World Wide Knowledge Base project of the CMU text learning group. The 8,282 pages were manually classified into 7 classes: 1) student, 2) faculty, 3) staff, 4) department, 5) course, 6) project and 7) other. For each class the data set contains pages from the four universities: Cornell, Texas, Washington, Wisconsin and 4,120 miscellaneous pages from other universities. The files are organized into a directory structure, one directory for each class. Each of these seven directories contains 5 subdirectories, one for each of the 4 universities and one for the miscellaneous pages. These directories in turn contain the Web-pages. The proposed work performs the data preprocessing to clean the dataset and transform it in to the pattern for classification. Then the feature extraction is performed for extracting only minimum number of representative features or terms extracted from it without using the entire Web page. After that the classification algorithm is used to classify the dataset into one of the seven classed using FP-Growth algorithm. The proposed approach is compared with the existing system apriori algorithm.  

Authors and Affiliations

S. Gowri Shanthi , Dr. Antony Selvadoss Thanamani,

Keywords

Related Articles

Design and Implementation of Floating Point Multiplier for Better Timing Performance  

IEEE Standard 754 floating point is the most common representation today for real numbers on computers. This paper gives a brief overview of IEEE floating point and its representation. This paper describes a sing...

Real Time Static Hand Gesture Recognition System in Complex Background that uses Number system of Indian Sign Language

Hand gestures are powerful means of communication among humans and sign language is the most natural and expressive way of communication for deaf and mute people. Communication between computers (or robot) and humans, ju...

Thermal nondestructive testing and spice simulation with approach of electro-thermal modelling

Non-destructive testing (NDT) refers to all the test methods, which permit testing or inspection of object without impairing its future usefulness. The aim of NDT is the detection of damages or unwanted irregularities...

Scalable Multicasting and Sustaining Proficient Over Mobile Ad Hoc Networks: MANET 

Cluster interactions are imperative in Mobile Ad hoc Networks (MANET). Multicast is an proficient technique for implementing cluster connections. However, it is exigent to execute competent and scalable multicast in MANE...

Fast distribution of Data in Wireless Sensor Network using Concurrency Operation 

Wireless Sensor Network can be applied in verity of applications in real time. Efficient data dissemination enables parameter reconfiguration, network reprogramming, security holes patching, software bug fixing an...

Download PDF file
  • EP ID EP98924
  • DOI -
  • Views 107
  • Downloads 0

How To Cite

S. Gowri Shanthi, Dr. Antony Selvadoss Thanamani, (2012). Enhanced Approach on Web Page Classification Using Machine Learning Technique  . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(7), 278-282. https://europub.co.uk/articles/-A-98924