Topic-specific Web Crawler using Probability Method

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2013, Vol 13, Issue 1

Abstract

 Web has become an integral part of our lives and search engines play an important role in making users search the content online using specific topic. The web is a huge and highly dynamic environment which is  growing exponentially in content and developing fast in structure. No search engine can cover the whole web,  but it has to focus on the most valuable pages for crawling. Many methods have been developed based on link  and text content analysis for retrieving the pages. Topic-specific web crawler collects the relevant web pages of  interested topics of the user from the web. In this paper, we present an algorithm that covers the link, text  content using Levenshtein distance and probability method to fetch more number of relevant pages based on the  topic specified by the user. Evaluation illustrates that the proposed web crawler collects the best web pages  under user interests during the earlier period of crawling

Authors and Affiliations

S Subatra Devi

Keywords

Related Articles

 An Overview of the Research on Plant Leaves Disease detection using Image Processing Techniques

 Diseases in plants cause major production and economic losses as well as reduction in both quality and quantity of agricultural products. Now a day’s plant diseases detection has received increasing attention in...

 Implementation of Novel Algorithm (SPruning Algorithm)

 Abstract: Decision trees are very significant for taking any type of verdict related to any field. Today there is ample amount of data but that data is uncooked data therefore to make it cooked data, data mining is...

 An Efficient Method to Prevent Information Leakage in Cloud

 Abstract : Cloud Computing is storing and accessing data and programs over the Internet instead of personal computers. It is a computing paradigm shift where computing is moved away from personal computers or an in...

Traffic Congestion Detection in Vehicular Adhoc Networks using GPS

In today’s world traffic congestion is the critical issue. Huge amount of time, fuel and money is wasted due to traffic jams all around the world. Drivers select the path that they consider will be the fastest; however t...

  Corporate Policy Governance in Secure MD5 DataChanges and Multi Hand Administration

Abstract: Policy based management is an administrative approach that simplify the management of a givenendeavor by establishing policies to deal with situation that are likely to occur. Most of the social network andmobi...

Download PDF file
  • EP ID EP104170
  • DOI -
  • Views 117
  • Downloads 0

How To Cite

S Subatra Devi (2013).  Topic-specific Web Crawler using Probability Method. IOSR Journals (IOSR Journal of Computer Engineering), 13(1), 102-106. https://europub.co.uk/articles/-A-104170