A Review on Effective and Efficient Detection of Duplicates in Data

Abstract

Detection of Duplication is an essential step in data cleansing. Data duplication techniques are used to link records which relate to the same entity in one or more data sets where a unique identifier is not available. Duplication detection is also called as Record Linkage. The major challenges in detection of duplication are the computational complexity and the linkage accuracy. Blocking and Windowing are two approaches used in Duplication detection. Windowing is a Sorted Neighbourhood Method; compare the records within window when it slides. Blocking is partition record method. The main focus of this paper is on maintain and improve efficiency as well as effectiveness of duplication detection by using adaptive windowing and blocking algorithms.

Authors and Affiliations

Varsha Wandhekar, Arti Mohanpurkar

Keywords

Related Articles

Detection and Classification of Distributed Denial of Service (DDoS) Attack

On line services are on a rapid upward push in today’s internet global. Web servers, which host these online services, are the prime targets for the hackers to perform Distributed Denial of Service (DDoS) attacks. Attac...

Preventing Private Information Inference Attacks on Online Social Networks

On-line social networks like Facebook are increasingly utilized by many people. These networks allow users to publish their own details and enable them to contact their friends. Some of the information revealed inside t...

A Context-Free Process as a Pushdown Automaton

Pushdown automata are used in theories about what can be computed by machines. They are more capable than finitestate machines but less capable than Turing machines. Deterministic pushdown automata can recognize all dete...

An Efficient Reused VLSI Architecture of FMO/Manchester Encoding using SOLS Technique for DSRC Applications

the dedicated short-extend correspondence (DSRC) is an effective procedure to push the clever transportation framework into our day by day life. The DSRC guidelines for the most part receive FM0 and Manchester codes to...

Effect on Mechanical Properties of Paver Block Consist Crusher Stone Dust as Fine Aggregate with Inclusion of Steel Fiber

Application of concrete paver block in road pavement is more common nowadays .concrete paver block is better option in road construction where conventional road construction is not suitable or uneconomical. This paper d...

Download PDF file
  • EP ID EP19038
  • DOI -
  • Views 260
  • Downloads 8

How To Cite

Varsha Wandhekar, Arti Mohanpurkar (2014). A Review on Effective and Efficient Detection of Duplicates in Data. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 2(11), -. https://europub.co.uk/articles/-A-19038