A Detailed Survey on Various Record Deduplication Methods 

Abstract

Deduplication is the key operation in data integration from multiple data sources. To achieve higher quality information and more simplified data representation, data preprocessing is required. Data cleaning is one among the data preprocessing steps. Data cleaning includes the process of parsing, data transformation, duplicate elimination and statistical methods. If two records represent the same real world entity then it is called duplicated records. The problem of detecting and eliminating duplicate records is called record deduplication. This paper presents an analysis of record deduplication techniques and algorithms that detect and remove the duplicate records.  

Authors and Affiliations

Lalitha. L , Maheswari. B , Dr. Karthik. S

Keywords

Related Articles

An efficient technique for maximization of network lifetime & minimization of delay for the performance enhancement of WSN 

The Main emphasis of this paper is to analyze and assess the performance of various techniques in the effort to determine the maximum lifetime of the nodes & minimization of delay. This paper,mainly concentrate...

Cloud computing for economic optimization in e-Governance: A Case Study  

There is an increase demand in the online e-Governance services provided by federal and provincial government in India, a country with more than one billion people, where proper implementation of these online servi...

Design of WLAN RF front end LNA for Noise & Gain Improvement  

The design of a Low Noise Amplifier (LNA) in Radio Frequency (RF) circuit requires the trade-off many importance characteristics such as gain, Noise Figure (NF), stability, power consumption and complexity).In this...

F-Measure Metric for English to Hindi Language Machine Translation 

The main objective of MT is to break the language barrier in a multilingual nation like India. Evaluation of MT is required for Indian languages because the same MT is not works in Indian language as in European la...

Balanced window size Allocation Mechanism for Congestion control of Transmission Control Protocol based on improved bandwidth Estimation.

TCP is the widely used protocol for its reliable data communication over the network. Though it is used for enabling communication over the large network, it has some incapability in handling continues data transmission...

Download PDF file
  • EP ID EP125836
  • DOI -
  • Views 67
  • Downloads 0

How To Cite

Lalitha. L, Maheswari. B, Dr. Karthik. S (2012). A Detailed Survey on Various Record Deduplication Methods . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(8), 160-163. https://europub.co.uk/articles/-A-125836