A Novel Approach for Web Document Classification

Abstract

The web is a huge repository of information and there is a need for categorizing web documents to facilitate the search and retrieval of documents. Web document classification plays an important role in information organization and retrieval.This paper presents a fuzzy set based approach for automatically classifying web documents into one of the classes represented by a set of training documents belonging to a number of classes. Using same word to represent more than one meaning and many words representing one meaning lead to ambiguity especially in web environment where numbers of users are very large. This problem is tackled using fuzzy association wherein each pair of words has a value associated with it. This helps in distinguishing it with other such pairs of words and thus helps in tackling ambiguities. The approach present in this paper does not require any parameter to be given by the user and hence is independent of any bias that may occur due to user input. It requires a training set on which the model is trained and then test set is given as input to be classified. We used Gensim package to implement the approach because of its simplicity and robust nature. The experimental results show that our approach efficiently classifies the web documents by tackling ambiguities among the words.

Authors and Affiliations

Rajendra Kumar Roul

Keywords

Related Articles

An improved load balancing adaptive QoS buffer scheduler (I-LABS) for streaming services over MANET

Large variations in network Quality of Service (QoS) in terms of bandwidth, latency, jitter may occur during media transfer over Mobile Ad-hoc Networks (MANETs). Researchers have identified that complex computing applica...

Implementing Phylogenetic Distance Based Methods for Tree Construction Using Hierarchical Clustering

Bioinformatics is a data intensive field of research and development. Key problem of knowledge discovery from large and complex databases is deal problem data mining. It is used to discover relationships and patterns in...

An Effective Test Suite Reduction Using Priority Cost Technique

Effective testing can develop quality software with higher productivity at a lower cost. As the software is modified and new test cases are added to the test suite, the size of the test suite grows and the cost of testin...

A Framework on Adaptive Information System for Mobile User

People are interested in having automatic services as per their frequent requirements in Personal Digital Assistant like mobile phones, tablets etc. Efficient mobile services to the citizens operating urban and rural lif...

A Review: Sobel Canny Hybrid Theoretical Approach & LOG Edge Detection Techniques for Digital Image

Edge detection is an important field in image processing. The purpose of image’s edge Detection is image segmentation, data compression, well matching such as image reconstruction and so on. Images to be compressed are f...

Download PDF file
  • EP ID EP146491
  • DOI -
  • Views 99
  • Downloads 0

How To Cite

Rajendra Kumar Roul (2013). A Novel Approach for Web Document Classification. International Journal of Computer Science & Engineering Technology, 4(8), 1118-1125. https://europub.co.uk/articles/-A-146491