A Novel Approach for Eliminating Duplicates in Large Dataset

Abstract

One of the serious problems faced in several applications with personal details management, customer affiliation management, data mining, etc is duplicate detection. This survey deals with the various duplicate record detection techniques in both small and large datasets. To detect the duplicity with less time of execution and also without disturbing the dataset quality, methods like Progressive Blocking and Progressive Neighborhood are used. Progressive sorted neighborhood method also called as PSNM is used in this model for finding or detecting the duplicate in a parallel approach. Progressive Blocking algorithm works on large datasets where finding duplication requires immense time. These algorithms are used to enhance duplicate detection system. The efficiency can be doubled over the conventional duplicate detection method using this algorithm. Several different methods of data analysis are studied here with various approaches for duplicate detection.

Authors and Affiliations

N Chaitanya, Appini Narayanarao, M. Srinivasulu

Keywords

Related Articles

Outlier Detection Using Oversampling PCA for Credit Card Fraud Detection

Credit card fraud detection is an important application of outlier detection in recent years. Many outlier detection techniques are available but they are working in batch mode, due to this those techniques are not appl...

Efficient Mobile Uplink LTE System Using Scfdma and Biorthogonal Wavelets

Mobile communications is one of the evolutionary technology in now a days. In this paper the proposed system is widely focused on to increase the efficiency of the mobile communications system in terms of decreasing the...

Suggestion of Traffic System for Various Places in Miraj City on The Basis of SATIS Project (Thane)

This project relates with the decongestion of the traffic in the three main areas of Miraj city, namely City Stand, Heera Chowk and Gandhi Chowk. Out of these two areas of highest traffic density, for the area of City...

Isolation and Screening of Melanin Producing Microorganism

Melanin is negatively charge compound composed of multi-functional polymers and polyphenolic compounds that are produced by various microorganisms by fermentation oxidation. Melanins are frequently used in medicines, ph...

slugStudy of Existing Work on Soft Computing Methodologies and Fusion of Neural Network and Fuzzy Logic for Estimation and Approximation

Estimation and Approximation plays a vital role in planning for future. It is up to the people, especially the business leaders to take its due advantage. Those who understand the significance of estimation, practice it...

Download PDF file
  • EP ID EP22510
  • DOI -
  • Views 213
  • Downloads 5

How To Cite

N Chaitanya, Appini Narayanarao, M. Srinivasulu (2016). A Novel Approach for Eliminating Duplicates in Large Dataset. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 4(8), -. https://europub.co.uk/articles/-A-22510