A Review on Effective and Efficient Detection of Duplicates in Data

Abstract

Detection of Duplication is an essential step in data cleansing. Data duplication techniques are used to link records which relate to the same entity in one or more data sets where a unique identifier is not available. Duplication detection is also called as Record Linkage. The major challenges in detection of duplication are the computational complexity and the linkage accuracy. Blocking and Windowing are two approaches used in Duplication detection. Windowing is a Sorted Neighbourhood Method; compare the records within window when it slides. Blocking is partition record method. The main focus of this paper is on maintain and improve efficiency as well as effectiveness of duplication detection by using adaptive windowing and blocking algorithms.

Authors and Affiliations

Varsha Wandhekar, Arti Mohanpurkar

Keywords

Related Articles

Improving Incremental Conductance Control Method of Solar Energy Conversion

Solar energy is a potential energy source in India. A photovoltaic is needed to harvest this kind of energy, and to be able to gather the most, the PV must have a good efficiency. The maximum efficiency is achieved when...

Perfect Degree Support Product Graphs

For a graph G(V,E), the support s(v) of a vertex v is defined as the sum of degrees of its neighbours. A graph G is said to be balanced (highly unbalanced), if the support of all the vertices are same (distinct). Let k...

Automatic Signal Scheduling For Efficient Traffic Management

Traffic congestion is one of the major problem in today’s world, which is need to be solved to improve traffic control and management. Vehicle flow detection appears to be an important part in today’s traffic management...

Parameter Optimization of Gas Metal Arc Welding Using Taguchi Method

Gas metal arc welding (GMAW) is currently one of the most popular welding methods, especially in industrial environments. In order to meet the global competition and the survival of products in the market a new way of t...

Design and Static Stress Analysis of Material Handling Tool Pallet/Skid

In today’s industries, material handling system is unitary of the significant system. The primary application of material handling equipment for storage and shipping of the products. Pallets are used mostly in storing h...

Download PDF file
  • EP ID EP19038
  • DOI -
  • Views 263
  • Downloads 8

How To Cite

Varsha Wandhekar, Arti Mohanpurkar (2014). A Review on Effective and Efficient Detection of Duplicates in Data. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 2(11), -. https://europub.co.uk/articles/-A-19038