A Review on Effective and Efficient Detection of Duplicates in Data

Abstract

Detection of Duplication is an essential step in data cleansing. Data duplication techniques are used to link records which relate to the same entity in one or more data sets where a unique identifier is not available. Duplication detection is also called as Record Linkage. The major challenges in detection of duplication are the computational complexity and the linkage accuracy. Blocking and Windowing are two approaches used in Duplication detection. Windowing is a Sorted Neighbourhood Method; compare the records within window when it slides. Blocking is partition record method. The main focus of this paper is on maintain and improve efficiency as well as effectiveness of duplication detection by using adaptive windowing and blocking algorithms.

Authors and Affiliations

Varsha Wandhekar, Arti Mohanpurkar

Keywords

Related Articles

Implementation on Data Cleaning for RFID and WSN Integration

Today’s manufacturing environments are very dynamic and turbulent. Wireless Sensor Network (WSN) and Radio Frequency Identification (RFID) integration is a developing innovation which utilizes focal points of the both f...

Study and Analysis of Copy-Move Forgery Detection in Digital Image using MATLAB

with the rapid development of ubiquitous availability of imaging tools and software, it is not difficult to tamper or forge the digital image. In today's digital age, it is feasible to add or remove important features f...

MR Brain Image Segmentation Based on Self-Organizing Map and Neural Network

Image segmentation is an important process to extract information from complex medical images. Segmentation has wide application in medical field. The main objective of image segmentation is to partition an image into m...

Analysis of Land Use Spatial Patterns in Coastal Andhra Region Using Remote Sensing, GIS and Computational Techniques

It is well known fact that urbanization is a prime aim of all the developing cities of India, Since 2010, infrastructure development has been the utmost priority of Indian government. This urbanization of cities, villag...

Seismic Performance Evaluation of RC Building Connected with and without Viscous Damper

Structures are mainly subjected to various types of loading conditions such as earthquake, wind loads etc. For earthquake zone areas, the structures are designed considering seismic forces. The structure which are prese...

Download PDF file
  • EP ID EP19038
  • DOI -
  • Views 226
  • Downloads 8

How To Cite

Varsha Wandhekar, Arti Mohanpurkar (2014). A Review on Effective and Efficient Detection of Duplicates in Data. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 2(11), -. https://europub.co.uk/articles/-A-19038