Query Based Duplicate Data Detection on WWW

Journal Title: International Journal on Computer Science and Engineering - Year 2010, Vol 2, Issue 4

Abstract

The problem of finding relevant documents has become much more prominent due to the presence of duplicate data on the WWW. This redundancy in results increases the users’ seek time to find the desired information within the search results, while in general most users just want to cull through tens of esult pages to find new/different results. The dentification of similar or near-duplicate pairs in a large ollection is a significant problem with wide-spread pplications. Another contemporary materialization of the problem is the efficient identification of near-duplicate Web pages. This is certainly challenging in the web-scale due to the voluminous data. Therefore, a mechanism needs to be introduced for detecting duplicate data so that relevant search results can be provided to the user. In this paper, architecture is being proposed that introduces methods that run online as well as offline on the asis of favored and disfavored user queries to detect uplicates and near duplicates.

Authors and Affiliations

Ranjna Gupta , Neelam Duhan , A. K. Sharma , Neha Aggarwal

Keywords

Related Articles

User Suggestions Extraction from customer Reviews

Customer review is a major criterion for the improvement of the quality of services rendered and enhancement of the deliverables. Blogs, articles and discussion forums, provide manufacturers or sellers with a good unders...

The proposed quantum computational basis of deep ecology: its implications for agriculture

Quantum computation has been proposed to generate consciousness. The terms atman field and consciousness vector have also been used to describe the properties of consciousness. It has also been proposed that the human ac...

A Novel Routing Algorithm Based on Link Failure Localization for MANET

The routing in Mobile Ad hoc Network (MANET) is a critical task due to dynamic topology. Many routing protocols were proposed which are categorized as proactive and reactive routing protocols. Route maintenance is a grea...

Relational Peer Data Sharing Settings and Consistent Query Answers

In this paper, we study the problem of consistent query answering in peer data sharing systems. In a peer data sharing system, databases in peers are designed and administered autonomously and acquaintances between peers...

An Invisible Zero Watermarking Algorithm using Combined Image and Text for Protecting Text Documents

Authentication and copyright protection for digital contents over the Internet can be achieved through digital watermarking. The major components of the Internet are textual contents. Hence protection of plain text docum...

Download PDF file
  • EP ID EP113538
  • DOI -
  • Views 102
  • Downloads 0

How To Cite

Ranjna Gupta, Neelam Duhan, A. K. Sharma, Neha Aggarwal (2010). Query Based Duplicate Data Detection on WWW. International Journal on Computer Science and Engineering, 2(4), 1395-1400. https://europub.co.uk/articles/-A-113538