A Novel Architecture for Domain Specific Parallel Crawler

Journal Title: Indian Journal of Computer Science and Engineering - Year 2010, Vol 1, Issue 1

Abstract

The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Due to the growing and dynamic nature of the web, it has become a challenge to traverse all URLs in the web documents and handle these URLs, so it has become imperative to parallelize a crawling process. The crawler process is further being parallelized in the form ecology of crawler workers that parallely download information from the web. This paper proposes a novel architecture of parallel crawler, which is based on domain specific crawling, makes crawling task more effective, scalable and load-sharing among the different crawlers which parallel download web pages related to different domains specific URLs.

Authors and Affiliations

Nidhi Tyagi , Deepti Gupta

Keywords

Related Articles

CLASSIFICATION OF KNEE MRI IMAGES

Classification is very important part of digital image analysis. It is a computational procedure that sort images into groups according to their similarities. MRI is latest medical imaging technology. Magnetic Resonance...

AN EXTENSIVE ANALYSIS OF MANET ATTACKS USING SPECIAL CHARACTERISTICS 

Wired or Wireless network, security is the most crucial part of any data transmission. Securing Mobile Ad hoc networks is an extremely tough issue because chances of having vulnerabilities are more when comparing to conv...

ENHANCED RABIN ALGORITHM BASED ERROR CONTROL MECHANISM FOR WIRELESS SENSOR NETWORKS

In wireless sensor nodes, the data transmitted from the sensor nodes are prune to corruption by induced errors by noisy channels and other relevant parameters. Hence it is always vital to provide an effective and efficie...

RANK LEVEL FUSION USING FINGERPRINT AND IRIS BIOMETRICS

Authentication of users is an essential and difficult to achieve in all systems. Shared secrets like Personal Identification Numbers (PIN) or Passwords and key devices such as Smart cards are not presently sufficient in...

Optimization of Subcarrier and Power Allocation in Uplink OFDMA Systems

The allocation of subcarriers and power to users is an important issue for exploiting the advantages of OFDMA systems. The main objective of the traditional OFDMA uplink resource allocation focuses on two aspects: one is...

Download PDF file
  • EP ID EP119118
  • DOI -
  • Views 158
  • Downloads 0

How To Cite

Nidhi Tyagi, Deepti Gupta (2010). A Novel Architecture for Domain Specific Parallel Crawler. Indian Journal of Computer Science and Engineering, 1(1), 44-53. https://europub.co.uk/articles/-A-119118