Improved Focused Crawler Using Inverted WAH Bitmap Index
Journal Title: International Journal of Advanced Research in Computer Engineering & Technology(IJARCET) - Year 2012, Vol 1, Issue 4
Abstract
Focused Crawlers are software which can traverse the internet and retrieve web pages by hyperlinks according to specific topic. The traditional web crawlers cannot function well to retrieve the relevant pages effectively. The focused crawler is a special-purpose search engine which aims to selectively seek out pages that are relevant. The main characteristic of focused crawling is that the crawler does not need to collect all web pages, but selects and retrieves only the relevant pages. So the major problem is how to retrieve the maximal set of relevant and quality pages. To address this problem, we have designed an Interactive focused crawler which calculates the relevancy of web page. It calculates the URL score for identifying whether a URL is relevant or not for a specific topic. The Interactive Focused Crawler proceeds by gathering pages related to the seed set by using techniques like keyword extraction and search engine query and link neighbourhood expansion. These collected pages are then prompted to the user in a ranked order that facilitates quick elimination of negatives. The user then provides feedback and helps the baseline classifier to be progressively induced using active learning techniques. Once the classifier is in place the crawler can be started on its task of resource discovery.
Authors and Affiliations
Sanjay Kumar Singh, , Sonu Agrawal,
5G Mobile Technology
5G Technology stands for fifth Generation Mobile technology. From generation 1G to 2.5G and from 3G to 5G this world of telecommunication has seen a number of improvements along with improved performance with every...
PERFORMANCE ANALYSIS OF MULTICAST ROUTING PROTOCOLS IMAODV, MAODV, ODMRP AND ADMR FOR MANET
-A Mobile Adhoc Network(MANET) is a collection of wireless mobile terminals that are able to dynamically form a temporary network without any aid from fixed infrastructure or centralized administration. Many appl...
Link Stability and Energy Optimization by Excluding Self node for Mobile and Wireless Networks
MOBILE ad hoc networks (MANETs) have more popularity among mobile network devices and wireless communication technologies. A MANET is multihop mobile wireless network that have neither a fixed infrastructure nor a...
Load Frequency Control of a Small Isolated Power Station by Using Supercapacitor Based Energy Storage System
Electrical Power System is always subjected to different loading conditions; most of the loads vary in an unbalanced manner. This load variation gives negative impact on the entire power system parameters. There ar...
A Tailored Ontology Sculpt For Web Information Congregation
As a sculpt for acquaintance explanation and exemplification, ontologies are extensively used to symbolize consumer profiles in tailored web information congregation. Conversely, when representing consumer profiles...