Crawler Using Inverted WAH Bitmap Index and Searching User Defined Document Fields

Abstract

 Crawler is a web crawler aiming to search and retrieve web pages from the World Wide Web, which are related to a specific topic. It based on some specific algorithms to select web pages relevant to some pre-defined set of topic. The main features of Crawler consist of a user interest specification module that mediates between users and search engines to identify target examples and keywords that together specify the topic of their interest, and a URL ordering strategy that combines features of several previous approaches and achieves significant improvement. It also provides a graphic user interface such that users can evaluate and visualize the crawling results that can be used as feedback to reconfigure the crawler. Such a web crawler may interact with millions of hosts over a period of weeks or months, and thus issues of robustness, flexibility, and manageability are of major importance. The crawler should retrieve the web pages of those URLs, parse the HTML files, add new URLs into its queue. The user then provides feedback and helps the baseline classifier to be progressively induced using active learning techniques. Once the classifier is in place the crawler can be started on its task of resource discovery.

Authors and Affiliations

Mr. Sanjay Kumar Singh

Keywords

Related Articles

 DATAMINING: CLUSTERING (INFORMATION FROM RURAL VILLAGES OF SIVAGANGAI DISTRICT)

 Research work is aimed to mining the rural villages of sivagangai district. Key factors to incorporated for mining information useful to village peoples and government are number of villages, number of families,...

 Efficient Patterns Using Hidden Web Trajectory Concept

 Previous Existing concepts of pattern discovery focus on background knowledge and ranking [1][2][5]. These kinds of patterns are not gives that any kind of benefits in business models. Background knowledge based...

Homomorphic Encryption Scheme & Its Application for Mobile Agent Security

Mobile agents (MA) are autonomous software entities that are able to migrate across heterogeneous network execution environments. Protection of Mobile agents is one of the most difficult problems in the area of mobile ag...

 Consistency Maintenance in Fractious P2P System

 Truncate dispose-to-identically to cagoule networks are out of doors worn in come unattended alongside systems. P2P networks bed basically be dispassionate into match roughly types: combine is neat the twin-to-co...

 A Survey on Hierarchical Routing Protocols in Wireless Sensor Networks

 There is several issues in Wireless Sensor Networks from which routing are also a major issue which is directly related to energy consumption. In order to increase the lifetime of network energy must be consumed...

Download PDF file
  • EP ID EP109273
  • DOI -
  • Views 67
  • Downloads 0

How To Cite

Mr. Sanjay Kumar Singh (2012).  Crawler Using Inverted WAH Bitmap Index and Searching User Defined Document Fields. International Journal of P2P Network Trends and Technology(IJPTT), 2(3), 56-59. https://europub.co.uk/articles/-A-109273