An Improved Approach to perform Crawling and avoid Duplicate Web Pages

Abstract

When a web search is performed it includes many duplicate web pages or the websites. It means we can get number of similar pages at different web servers. We are proposing a Web Crawling Approach to Detect and avoid Duplicate or Near Duplicate WebPages. In this proposed work we are presenting a keyword Prioritization based approach to identify the web page over the web. As such pages will be identified it will optimize the web search.

Authors and Affiliations

Dhiraj Khurana , Satish Kumar

Keywords

Related Articles

Experiential Marketing: Changing xperiential Paradigm for Marketers

With changing economics of customer relationships there is needed to implement new solutions and strategies that address these changes. Experiential marketing helps in the extraction of hidden predictive information fro...

ANALYSIS OF A SINGLE UNIT SHOCK MODEL BY USING REGENERATIVE POINT GRAPHICAL TECHNIQUE

For the doing the analysis of a stochastic system rapidly, the key reliability characteristics should be  asily and quickly evaluated. Gupta [4] introduced a new technique called Regenerative Point Graphical Te...

“A STUDY ON SECURE SHELL (SSH) PROTOCOL”

Secure Shell (SSH) provides an open protocol for securing network communications that is less complex and expensive than hardware-based VPN solutions. Secure Shell client/server solutions provide command shell, file tran...

Performance Analysis of FMCW Sub Surface Penetrating Radar

This paper explains a approach towards the implementation of a frequency modulated continuous wave (FMCW) standard system in Advanced Design System (ADS) software for Sub surface penetrating radar (SSPR). For performing...

Auction Oriented Approach for Resource Management in Grid Computing

Grid computing, emerging as a new paradigm for next-generation computing, enables the sharing, selection, and aggregation of geographically distributed heterogeneous resources for solving large-scale problems in science...

Download PDF file
  • EP ID EP87024
  • DOI -
  • Views 131
  • Downloads 0

How To Cite

Dhiraj Khurana, Satish Kumar (2012). An Improved Approach to perform Crawling and avoid Duplicate Web Pages. International Journal of Computer Science and Management Studies (IJCSMS) www.ijcsms.com, 12(0), 358-361. https://europub.co.uk/articles/-A-87024