An Improved Approach to perform Crawling and avoid Duplicate Web Pages

Abstract

When a web search is performed it includes many duplicate web pages or the websites. It means we can get number of similar pages at different web servers. We are proposing a Web Crawling Approach to Detect and avoid Duplicate or Near Duplicate WebPages. In this proposed work we are presenting a keyword Prioritization based approach to identify the web page over the web. As such pages will be identified it will optimize the web search.

Authors and Affiliations

Dhiraj Khurana , Satish Kumar

Keywords

Related Articles

A Comparison among Various Techniques to Prioritize the Requirements

In commercial software system development, software vendors often face the many difficulties to deal with large amount of requirements that enter the company every day. It is not possible to satisfy all the requirements...

Security Issues Pertaining to Ad-Hoc Networks - A Survey

A mobile ad-hoc network (MANET) is a self-configuring network of mobile routers (and associated hosts) connected by wireless links—the union of which form an arbitrary topology. The routers are free to move randomly and...

  ANALYTICAL PERFORMANCE COMPARISON OF DIFFERENT ROUTING PROTOCOLS IN MOBILE AD HOC WIRELESS NETWORKS

 Mobile Ad-Hoc Networks (MANETs) are becoming increasingly popular as more and more mobile devices find their way to the public, besides “traditional” uses such as military battlefields and disaster situations they...

An Implementation of Advanced Traffic Control Techniques in MANET

Mobile Ad hoc Networks (MANET) has become an exciting and important technology in recent years because of the rapid proliferation of wireless devices. A mobile adhoc network (MANET) is a self-configuring network of mobil...

 An Intelligent Approach to Perform Image Fusion Using Segmentation

This paper deals with the clearance of images by the Fusion technique. The main objective of image fusion is to extract all the useful information from the source images. It does not introduce artifacts or inconsistencie...

Download PDF file
  • EP ID EP87024
  • DOI -
  • Views 129
  • Downloads 0

How To Cite

Dhiraj Khurana, Satish Kumar (2012). An Improved Approach to perform Crawling and avoid Duplicate Web Pages. International Journal of Computer Science and Management Studies (IJCSMS) www.ijcsms.com, 12(0), 358-361. https://europub.co.uk/articles/-A-87024