Query Based Duplicate Data Detection on WWW
Journal Title: International Journal on Computer Science and Engineering - Year 2010, Vol 2, Issue 4
Abstract
The problem of finding relevant documents has become much more prominent due to the presence of duplicate data on the WWW. This redundancy in results increases the users’ seek time to find the desired information within the search results, while in general most users just want to cull through tens of esult pages to find new/different results. The dentification of similar or near-duplicate pairs in a large ollection is a significant problem with wide-spread pplications. Another contemporary materialization of the problem is the efficient identification of near-duplicate Web pages. This is certainly challenging in the web-scale due to the voluminous data. Therefore, a mechanism needs to be introduced for detecting duplicate data so that relevant search results can be provided to the user. In this paper, architecture is being proposed that introduces methods that run online as well as offline on the asis of favored and disfavored user queries to detect uplicates and near duplicates.
Authors and Affiliations
Ranjna Gupta , Neelam Duhan , A. K. Sharma , Neha Aggarwal
The Efficient Ant Routing Protocol for MANET
In recent years, mobile computing and wireless networks have witnessed a tremendous rise in popularity and technological advancement. The basic routing problem in MANET deals with methods to transport a packet across a n...
Automated System for interpreting Non-verbal Communication in Video Conferencing
Gesture is a form of non-verbal, action-based communication made with a part of the body and used instead of and/or in combination with verbal communication. People frequently use gestures for more effective inter-person...
Developing a Mobile Adaptive Test (MAT) in an M-Learning Environment for Android Based 3G Mobile Devices
M-Learning (e-Learning through mobile device) is gaining its importance worldwide among the Learning Community owing to the ubiquitous and powerful computing nature of the Mobile Devices. In an M-Learning environment, St...
AN EFFICIENT APPROACH FOR EXTRACTION OF LINEAR FEATURES FROM HIGH RESOLUTION INDIAN SATELLITE IMAGERIES
This paper presents an Object oriented feature extraction pproach in order to classify the linear features like drainage, roads etc. from high resolution Indian satellite imageries. It starts with the multiresolution se...
Cooperative Communications: A New Trend in the Wireless World
This Wireless channel while offering independence of movement also introduces un-reliability in the messages received at the destination. Various strategies have been introduced so far to mitigate the effects of the chan...