Improved Focused Crawler Using Inverted WAH Bitmap Index  

Abstract

Focused Crawlers are software which can traverse the internet and retrieve web pages by hyperlinks according to specific topic. The traditional web crawlers cannot function well to retrieve the relevant pages effectively. The focused crawler is a special-purpose search engine which aims to selectively seek out pages that are relevant. The main characteristic of focused crawling is that the crawler does not need to collect all web pages, but selects and retrieves only the relevant pages. So the major problem is how to retrieve the maximal set of relevant and quality pages. To address this problem, we have designed an Interactive focused crawler which calculates the relevancy of web page. It calculates the URL score for identifying whether a URL is relevant or not for a specific topic. The Interactive Focused Crawler proceeds by gathering pages related to the seed set by using techniques like keyword extraction and search engine query and link neighbourhood expansion. These collected pages are then prompted to the user in a ranked order that facilitates quick elimination of negatives. The user then provides feedback and helps the baseline classifier to be progressively induced using active learning techniques. Once the classifier is in place the crawler can be started on its task of resource discovery.  

Authors and Affiliations

Sanjay Kumar Singh, , Sonu Agrawal,

Keywords

Related Articles

5G Mobile Technology 

5G Technology stands for fifth Generation Mobile technology. From generation 1G to 2.5G and from 3G to 5G this world of telecommunication has seen a number of improvements along with improved performance with every...

PERFORMANCE ANALYSIS OF MULTICAST ROUTING PROTOCOLS IMAODV, MAODV, ODMRP AND ADMR FOR MANET 

-A Mobile Adhoc Network(MANET) is a collection of wireless mobile terminals that are able to dynamically form a temporary network without any aid from fixed infrastructure or centralized administration. Many appl...

Link Stability and Energy Optimization by Excluding Self node for Mobile and Wireless Networks 

MOBILE ad hoc networks (MANETs) have more popularity among mobile network devices and wireless communication technologies. A MANET is multihop mobile wireless network that have neither a fixed infrastructure nor a...

Load Frequency Control of a Small Isolated Power Station by Using Supercapacitor Based Energy Storage System  

Electrical Power System is always subjected to different loading conditions; most of the loads vary in an unbalanced manner. This load variation gives negative impact on the entire power system parameters. There ar...

A Tailored Ontology Sculpt For Web Information Congregation 

As a sculpt for acquaintance explanation and exemplification, ontologies are extensively used to symbolize consumer profiles in tailored web information congregation. Conversely, when representing consumer profiles...

Download PDF file
  • EP ID EP115280
  • DOI -
  • Views 65
  • Downloads 0

How To Cite

Sanjay Kumar Singh, , Sonu Agrawal, (2012). Improved Focused Crawler Using Inverted WAH Bitmap Index  . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(4), 407-409. https://europub.co.uk/articles/-A-115280