A Novel Approach for Eliminating Duplicates in Large Dataset
Journal Title: International Journal for Research in Applied Science and Engineering Technology (IJRASET) - Year 2016, Vol 4, Issue 8
Abstract
One of the serious problems faced in several applications with personal details management, customer affiliation management, data mining, etc is duplicate detection. This survey deals with the various duplicate record detection techniques in both small and large datasets. To detect the duplicity with less time of execution and also without disturbing the dataset quality, methods like Progressive Blocking and Progressive Neighborhood are used. Progressive sorted neighborhood method also called as PSNM is used in this model for finding or detecting the duplicate in a parallel approach. Progressive Blocking algorithm works on large datasets where finding duplication requires immense time. These algorithms are used to enhance duplicate detection system. The efficiency can be doubled over the conventional duplicate detection method using this algorithm. Several different methods of data analysis are studied here with various approaches for duplicate detection.
Authors and Affiliations
N Chaitanya, Appini Narayanarao, M. Srinivasulu
Impulse Noise Reduction and Enhancement of Gray Scale and Color Images
A new filtering algorithm, adaptive weight algorithm, is developed for the removal of impulse noise present in the Images. This approach has two major steps, in that, first is to detect noise pixels according to the cor...
A Review on Multi Objective Optimization in Turning Using Goal Programming and Taguchi Technique on CNC Lathe
The Call Of Recent Time Is To Make Precise And Dimensionally Accurate Product To Sustain In Competitive Market. In Machining Process, Surface Finish Is One Of The Most Significant Technical Requirements Of The Customer...
Study of Meta, Naïve Bayes and Decision Tree based Classifiers
Classification deals with the kind of data mining problem which are concerned with prediction. Its main task is to classify the data in order to make predictions about new data. In general Classification can be viewed a...
Person Identification Technique Using Human Iris Recognition
The security is an important aspect in our daily life whichever the system we consider security plays vital role. The biometric person identification technique iris is well suited to be applied to access control and pro...
Survey on Community and Social Networks in Big Data
Today’s world is interconnected through many types of links. These links include Web pages, blogs, and emails. This consider community mining and the mining of social networks as important topics. Big data has become an...