Query Based Duplicate Data Detection on WWW
Journal Title: International Journal on Computer Science and Engineering - Year 2010, Vol 2, Issue 4
Abstract
The problem of finding relevant documents has become much more prominent due to the presence of duplicate data on the WWW. This redundancy in results increases the users’ seek time to find the desired information within the search results, while in general most users just want to cull through tens of esult pages to find new/different results. The dentification of similar or near-duplicate pairs in a large ollection is a significant problem with wide-spread pplications. Another contemporary materialization of the problem is the efficient identification of near-duplicate Web pages. This is certainly challenging in the web-scale due to the voluminous data. Therefore, a mechanism needs to be introduced for detecting duplicate data so that relevant search results can be provided to the user. In this paper, architecture is being proposed that introduces methods that run online as well as offline on the asis of favored and disfavored user queries to detect uplicates and near duplicates.
Authors and Affiliations
Ranjna Gupta , Neelam Duhan , A. K. Sharma , Neha Aggarwal
Quantum computation and Biological stress: A Hypothesis
We propose that biological systems may behave as quantum computers.We have earlier hypothesized that patterns of quantum computation may be altered in stress and this leads to the change in the consciousness vector of bi...
Approaches for Intelligent Traffic System: A Survey
This survey presents various approaches for intelligent traffic systems. The potential research fields in which Intelligent Traffic System emerges as an important application area are highlighted and various issues have...
A Handoff Technique to Reduce False-Handoff Probability in Next Generation Wireless Networks
Next Generation Wireless Systems (NGWS) include o-existence of current wireless technologies such as WLANs, WiMAX, General Packet Radio Service (GPRS) and Universal obile Telecommunications System (UMTS). The most impo...
Estimation of Solar Radiation at a Particular Place: Comparative study between Soft Computing and Statistical Approach
This study focuses on the development of connectionist model such as neural network based method to efficiently predict solar radiation of a particular place. Here a comparative study is given between a conventional appr...
MEASURING THE QUALITY OF OBJECT ORIENTED SOFTWARE MODULARIZATION DEFINING METRICS AND ALGORITHM
We proposed a System to measure the quality of modularization of object-oriented software system. Our work is proposed in three Parts as follows: MODULE 1: DEFINING METRICS FOR OBJECT ORIENTED SOFTWARE AND ALGORITHM M...