A Novel Architecture for Domain Specific Parallel Crawler
Journal Title: Indian Journal of Computer Science and Engineering - Year 2010, Vol 1, Issue 1
Abstract
The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Due to the growing and dynamic nature of the web, it has become a challenge to traverse all URLs in the web documents and handle these URLs, so it has become imperative to parallelize a crawling process. The crawler process is further being parallelized in the form ecology of crawler workers that parallely download information from the web. This paper proposes a novel architecture of parallel crawler, which is based on domain specific crawling, makes crawling task more effective, scalable and load-sharing among the different crawlers which parallel download web pages related to different domains specific URLs.
Authors and Affiliations
Nidhi Tyagi , Deepti Gupta
MOVING MACHINE TRANSLATION SYSTEM TO WEB
The paper presents an overview of an online system based on Punjabi to Hindi Machine translation system. The implementation of the system is roughly divided into two parts: the client side and the server side. On the cli...
PROCESS MODEL FOR REUSABILITY IN CONTEXT-SPECIFIC REUSABLE SOFTWARE COMPONENTS
Constructing component based software using reusable components is becoming a promising approach. Context-specific reuse is a broadly used way to increase the value of reuse. This paper reports our on-going work aimed at...
NETWORK NEUTRALITY SURVEY
Network Neutrality is one of the most acrimoniously debated topics in Academia and Industry. Extensive literature has been written on it on both sides of the subject. In this survey paper we summarize the related promine...
A PKI ARCHITECTURE USING OPEN SOURCE SOFTWARE FOR EGOVERNMENT SERVICES IN ROMANIA
This article presents an architecture based on Open Source software that promote citizen’s access to electronic services in a secure way and attempt to make an analysis between two different Open Source Public Key Infras...
A Technique to improve Security of Data in Multilevel Trust
The Privacy Preserving Data Mining technique that is used widely to conserve security of data is a random perturbation method. The original data is modified and many copies are created according to the trust levels in ea...