A Novel Architecture for Domain Specific Parallel Crawler

Journal Title: Indian Journal of Computer Science and Engineering - Year 2010, Vol 1, Issue 1

Abstract

The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Due to the growing and dynamic nature of the web, it has become a challenge to traverse all URLs in the web documents and handle these URLs, so it has become imperative to parallelize a crawling process. The crawler process is further being parallelized in the form ecology of crawler workers that parallely download information from the web. This paper proposes a novel architecture of parallel crawler, which is based on domain specific crawling, makes crawling task more effective, scalable and load-sharing among the different crawlers which parallel download web pages related to different domains specific URLs.

Authors and Affiliations

Nidhi Tyagi , Deepti Gupta

Keywords

Related Articles

WEB BASED E-LEARNING IN INDIA: THE CUMULATIVE VIEWS OF DIFFERENT ASPECTS

In the presence of great social diversity in India, it is difficult to change the social background of students, parents and their economical conditions. Therefore the only option left for us is to provide uniform or sta...

DATA MINING TECHNIQUES AND APPLICATIONS

Data mining is a process which finds useful patterns from large amount of data. The paper discusses few of the data mining techniques, algorithms and some of the organizations which have adapted data mining technology to...

TRACING EFFICIENT PATH USING WEB PATH TRACING

In the fast improving society, people depend on online purchase of goods than spending time physically. So there are lots of resources emerged for this online buying and selling of materials. Efficient and attractive web...

Software Architecture modeling framework using UML

The software architecture’s are built using some specific languages while developing a project. Architecture design languages are used in research and industrial projects that are used represented using Unified Modeling...

Optimization of Computer Networks

Computer Networks have pervaded our life like anything. They are present in all aspects of our life. Information transmission like Internet usage uses computer networks. As more and more people use computer networks, tra...

Download PDF file
  • EP ID EP119118
  • DOI -
  • Views 131
  • Downloads 0

How To Cite

Nidhi Tyagi, Deepti Gupta (2010). A Novel Architecture for Domain Specific Parallel Crawler. Indian Journal of Computer Science and Engineering, 1(1), 44-53. https://europub.co.uk/articles/-A-119118