Query Based Duplicate Data Detection on WWW

Journal Title: International Journal on Computer Science and Engineering - Year 2010, Vol 2, Issue 4

Abstract

The problem of finding relevant documents has become much more prominent due to the presence of duplicate data on the WWW. This redundancy in results increases the users’ seek time to find the desired information within the search results, while in general most users just want to cull through tens of esult pages to find new/different results. The dentification of similar or near-duplicate pairs in a large ollection is a significant problem with wide-spread pplications. Another contemporary materialization of the problem is the efficient identification of near-duplicate Web pages. This is certainly challenging in the web-scale due to the voluminous data. Therefore, a mechanism needs to be introduced for detecting duplicate data so that relevant search results can be provided to the user. In this paper, architecture is being proposed that introduces methods that run online as well as offline on the asis of favored and disfavored user queries to detect uplicates and near duplicates.

Authors and Affiliations

Ranjna Gupta , Neelam Duhan , A. K. Sharma , Neha Aggarwal

Keywords

Related Articles

Quantum computation and Biological stress: A Hypothesis

We propose that biological systems may behave as quantum computers.We have earlier hypothesized that patterns of quantum computation may be altered in stress and this leads to the change in the consciousness vector of bi...

Approaches for Intelligent Traffic System: A Survey

This survey presents various approaches for intelligent traffic systems. The potential research fields in which Intelligent Traffic System emerges as an important application area are highlighted and various issues have...

A Handoff Technique to Reduce False-Handoff Probability in Next Generation Wireless Networks

Next Generation Wireless Systems (NGWS) include o-existence of current wireless technologies such as WLANs, WiMAX, General Packet Radio Service (GPRS) and Universal obile Telecommunications System (UMTS). The most impo...

Estimation of Solar Radiation at a Particular Place: Comparative study between Soft Computing and Statistical Approach

This study focuses on the development of connectionist model such as neural network based method to efficiently predict solar radiation of a particular place. Here a comparative study is given between a conventional appr...

MEASURING THE QUALITY OF OBJECT ORIENTED SOFTWARE MODULARIZATION DEFINING METRICS AND ALGORITHM

We proposed a System to measure the quality of modularization of object-oriented software system. Our work is proposed in three Parts as follows: MODULE 1: DEFINING METRICS FOR OBJECT ORIENTED SOFTWARE AND ALGORITHM M...

Download PDF file
  • EP ID EP113538
  • DOI -
  • Views 109
  • Downloads 0

How To Cite

Ranjna Gupta, Neelam Duhan, A. K. Sharma, Neha Aggarwal (2010). Query Based Duplicate Data Detection on WWW. International Journal on Computer Science and Engineering, 2(4), 1395-1400. https://europub.co.uk/articles/-A-113538