A Review on Effective and Efficient Detection of Duplicates in Data

Abstract

Detection of Duplication is an essential step in data cleansing. Data duplication techniques are used to link records which relate to the same entity in one or more data sets where a unique identifier is not available. Duplication detection is also called as Record Linkage. The major challenges in detection of duplication are the computational complexity and the linkage accuracy. Blocking and Windowing are two approaches used in Duplication detection. Windowing is a Sorted Neighbourhood Method; compare the records within window when it slides. Blocking is partition record method. The main focus of this paper is on maintain and improve efficiency as well as effectiveness of duplication detection by using adaptive windowing and blocking algorithms.

Authors and Affiliations

Varsha Wandhekar, Arti Mohanpurkar

Keywords

Related Articles

Quantification and Characterization of the Municipal Solid Waste from Dharapuram Municipality, Tamilnadu, India– A Case Study

Municipal Solid waste management (MSWM) is one of the most challenging issues in India than elsewhere at the global level. The present investigation is a case study of Dharapuram Municipality, Tirupur District Tamilnadu...

slugEnhancement in File compression using Huffman approach

In this paper, we are showing how we can enhance the file compression by using Huffman approach. The main aim of the thesis is to utilize the concept of type casting and data normalization to show that it is good practi...

Proposed & Implemented Clustering Algorithm for Indexing in Search Engine

This paper proposes clustering algorithm for implementing indexing phase of search engine. The goal of making an index is to optimize speed and performance in finding relevant documents for a search query. Without an in...

Fault Detection and Autoline Distribution System with GSM Module

Transmission lines is the important factor of the power system. Transmission and distribution lines has good contribution in the generating unit and consumers to obtain the continuity of electric supply. To economically...

Block Based Motion Estimation Transmission over Wireless Channels Using Distance Power Adaptation

Block based transmission plays an important role for video transmission in wireless communications. This paper proposes a Distance based Power Adaptation Algorithm (DPAA) for Motion Estimation to improve the motion esti...

Download PDF file
  • EP ID EP19038
  • DOI -
  • Views 282
  • Downloads 8

How To Cite

Varsha Wandhekar, Arti Mohanpurkar (2014). A Review on Effective and Efficient Detection of Duplicates in Data. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 2(11), -. https://europub.co.uk/articles/-A-19038