Topic-specific Web Crawler using Probability Method
Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2013, Vol 13, Issue 1
Abstract
Web has become an integral part of our lives and search engines play an important role in making users search the content online using specific topic. The web is a huge and highly dynamic environment which is growing exponentially in content and developing fast in structure. No search engine can cover the whole web, but it has to focus on the most valuable pages for crawling. Many methods have been developed based on link and text content analysis for retrieving the pages. Topic-specific web crawler collects the relevant web pages of interested topics of the user from the web. In this paper, we present an algorithm that covers the link, text content using Levenshtein distance and probability method to fetch more number of relevant pages based on the topic specified by the user. Evaluation illustrates that the proposed web crawler collects the best web pages under user interests during the earlier period of crawling
Authors and Affiliations
S Subatra Devi
Enhanced Data Processing Using Positive Negative Association Mining on AJAX Data
Knowledge discovery is the process of analyzing data from different perspectives and summarizing it into useful information. [1] Association rule mining is a data mining process used widely in traditional databases to fi...
Reduce the False Positive and False Negative from Real Trafficwith Intrusion Detection in Zigbee Wireless Networks
Abstract: Denial-of-Service attack in particular is a threat to zigbee wireless networks. It is an attack in whichthe primary goal is to deny the legitimate users access to the resources. A node is prevented from r...
Performance Evaluation of Wlan by Varying Pcf, Dcf and Enhanced Dcf Slots To Improve Quality of Service
Researchers have proposed a number of co-ordination functions in literature for improving quality of service. Each one is based on different characteristics and properties. In this paper, we evaluate the perfor...
Optical Character Recognition using Dynamic Memory Image Algorithm
Abstract : Embedded products have different types of display screens like LCD, Touch screen, CRT etc. For automating the development testing of such devices, we need to recognize the text displayed on it using a ca...
Data Protection Based On Dynamic Encryption for Secure Cloud Computing
Cloud Computing is the long dreamed vision of computing as a utility, where users can remotely store their data into the cloud so as to enjoy the on-demand high quality applications and services from a shared poo...