Query Based Duplicate Data Detection on WWW
Journal Title: International Journal on Computer Science and Engineering - Year 2010, Vol 2, Issue 4
Abstract
The problem of finding relevant documents has become much more prominent due to the presence of duplicate data on the WWW. This redundancy in results increases the users’ seek time to find the desired information within the search results, while in general most users just want to cull through tens of esult pages to find new/different results. The dentification of similar or near-duplicate pairs in a large ollection is a significant problem with wide-spread pplications. Another contemporary materialization of the problem is the efficient identification of near-duplicate Web pages. This is certainly challenging in the web-scale due to the voluminous data. Therefore, a mechanism needs to be introduced for detecting duplicate data so that relevant search results can be provided to the user. In this paper, architecture is being proposed that introduces methods that run online as well as offline on the asis of favored and disfavored user queries to detect uplicates and near duplicates.
Authors and Affiliations
Ranjna Gupta , Neelam Duhan , A. K. Sharma , Neha Aggarwal
User Suggestions Extraction from customer Reviews
Customer review is a major criterion for the improvement of the quality of services rendered and enhancement of the deliverables. Blogs, articles and discussion forums, provide manufacturers or sellers with a good unders...
The proposed quantum computational basis of deep ecology: its implications for agriculture
Quantum computation has been proposed to generate consciousness. The terms atman field and consciousness vector have also been used to describe the properties of consciousness. It has also been proposed that the human ac...
A Novel Routing Algorithm Based on Link Failure Localization for MANET
The routing in Mobile Ad hoc Network (MANET) is a critical task due to dynamic topology. Many routing protocols were proposed which are categorized as proactive and reactive routing protocols. Route maintenance is a grea...
Relational Peer Data Sharing Settings and Consistent Query Answers
In this paper, we study the problem of consistent query answering in peer data sharing systems. In a peer data sharing system, databases in peers are designed and administered autonomously and acquaintances between peers...
An Invisible Zero Watermarking Algorithm using Combined Image and Text for Protecting Text Documents
Authentication and copyright protection for digital contents over the Internet can be achieved through digital watermarking. The major components of the Internet are textual contents. Hence protection of plain text docum...