A Novel Approach for Web Document Classification
Journal Title: International Journal of Computer Science & Engineering Technology - Year 2013, Vol 4, Issue 8
Abstract
The web is a huge repository of information and there is a need for categorizing web documents to facilitate the search and retrieval of documents. Web document classification plays an important role in information organization and retrieval.This paper presents a fuzzy set based approach for automatically classifying web documents into one of the classes represented by a set of training documents belonging to a number of classes. Using same word to represent more than one meaning and many words representing one meaning lead to ambiguity especially in web environment where numbers of users are very large. This problem is tackled using fuzzy association wherein each pair of words has a value associated with it. This helps in distinguishing it with other such pairs of words and thus helps in tackling ambiguities. The approach present in this paper does not require any parameter to be given by the user and hence is independent of any bias that may occur due to user input. It requires a training set on which the model is trained and then test set is given as input to be classified. We used Gensim package to implement the approach because of its simplicity and robust nature. The experimental results show that our approach efficiently classifies the web documents by tackling ambiguities among the words.
Authors and Affiliations
Rajendra Kumar Roul
An improved load balancing adaptive QoS buffer scheduler (I-LABS) for streaming services over MANET
Large variations in network Quality of Service (QoS) in terms of bandwidth, latency, jitter may occur during media transfer over Mobile Ad-hoc Networks (MANETs). Researchers have identified that complex computing applica...
Implementing Phylogenetic Distance Based Methods for Tree Construction Using Hierarchical Clustering
Bioinformatics is a data intensive field of research and development. Key problem of knowledge discovery from large and complex databases is deal problem data mining. It is used to discover relationships and patterns in...
An Effective Test Suite Reduction Using Priority Cost Technique
Effective testing can develop quality software with higher productivity at a lower cost. As the software is modified and new test cases are added to the test suite, the size of the test suite grows and the cost of testin...
A Framework on Adaptive Information System for Mobile User
People are interested in having automatic services as per their frequent requirements in Personal Digital Assistant like mobile phones, tablets etc. Efficient mobile services to the citizens operating urban and rural lif...
A Review: Sobel Canny Hybrid Theoretical Approach & LOG Edge Detection Techniques for Digital Image
Edge detection is an important field in image processing. The purpose of image’s edge Detection is image segmentation, data compression, well matching such as image reconstruction and so on. Images to be compressed are f...