Improved Focused Crawler Using Inverted WAH Bitmap Index  

Abstract

Focused Crawlers are software which can traverse the internet and retrieve web pages by hyperlinks according to specific topic. The traditional web crawlers cannot function well to retrieve the relevant pages effectively. The focused crawler is a special-purpose search engine which aims to selectively seek out pages that are relevant. The main characteristic of focused crawling is that the crawler does not need to collect all web pages, but selects and retrieves only the relevant pages. So the major problem is how to retrieve the maximal set of relevant and quality pages. To address this problem, we have designed an Interactive focused crawler which calculates the relevancy of web page. It calculates the URL score for identifying whether a URL is relevant or not for a specific topic. The Interactive Focused Crawler proceeds by gathering pages related to the seed set by using techniques like keyword extraction and search engine query and link neighbourhood expansion. These collected pages are then prompted to the user in a ranked order that facilitates quick elimination of negatives. The user then provides feedback and helps the baseline classifier to be progressively induced using active learning techniques. Once the classifier is in place the crawler can be started on its task of resource discovery.  

Authors and Affiliations

Sanjay Kumar Singh, , Sonu Agrawal,

Keywords

Related Articles

Integrating Assembly Lines based on Lean Line Design Concept 

In today‟s competitive business scenario manufacturing industries are under the pressure to reduce cost and cycle time. It is a lean manufacturing concept with a systematic approach to identify and eliminate wast...

Enhancer- A Time Commit Protocol 

This paper contains content with the investigating the performance implications of providing transaction atomicity for a deadline real time applications operating on distributed data. Considering all the commit p...

Multiple Target Tracking with the help of Mean Shift Algorithm 

The multi moving targets tracking are a typical job in the field of visual surveillance. The main difficulties in targets tracking are fast motion of the target, suddenly velocity variations, clutters, complex obje...

Implementation of Enhanced CloSpan Algorithm for CP-Miner  

Copy-pasted code is very common in large software and product line software because programmers prefer reusing code via copy-paste in order to reduce programming effort. Copy pasted code is prone to introducing error...

Radio Interferences Performances in 750KV Transmission Line & 400KV Transmission Line of HVAC Transmission system by MATLAB program

This paper presents the methodologies for the radio interference measurements of electrical system in transmission lines, its effect, level, rules and design criteria describes that. This paper also shown that 750kV and...

Download PDF file
  • EP ID EP115280
  • DOI -
  • Views 82
  • Downloads 0

How To Cite

Sanjay Kumar Singh, , Sonu Agrawal, (2012). Improved Focused Crawler Using Inverted WAH Bitmap Index  . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(4), 407-409. https://europub.co.uk/articles/-A-115280