A Scalable Approach To Detect A Duplicate Data Using PSNM And PB Algorithm

Abstract

Now a day if we consider a data set we can find more duplicate data. Determining the redundant data in the data server is an open research in the data intensive application. The traditional method detects the duplicate for large dataset takes a large amount of time to produce the result. I proposed an algorithm (PSNM and PB) such that they maximize the gain of the overall process within the time available by reporting most results much earlier than traditional approaches. The contribution of the work gets improved by implementing both the algorithms in parallel process to effectively compute the duplication record in efficient time. The algorithm dynamically adjusts their behavior by automatically choosing optimal parameters, e.g., window sizes, block sizes, and sorting keys. The Experimental results prove that proposed system outperforms the state of arts approaches accuracy and efficiency.

Authors and Affiliations

P. Padmavathi, Mr. S. Dhanasekaran, Mr. A. Arockia Selvaraj

Keywords

Related Articles

Seismic Surface Wave Analysis for Layering Information of the Crust Using Sikim Earthquake

Seismic surface wave analysis has performed for Sikim earthquake of magnitude 6.9 occurred on 18 September 2011 of 12:40:48 UTC. Group velocity dispersion has determined by graphical method. A model taking subsurface la...

Automatic Irrigation System using Embedded System and GSM Technology

The increase in human population resulted in the increase of the demand for food production. Farmers are unable to meet the requirements due to irregular water supply. To meet the demand, a new type of system is needed...

Performance Evaluation of Effluent Treatment Plant of SRF Limited, Malanpur Bhind (M.P.)

the present study has been undertaken to evaluate the performance of the effluent treatment plant (etp) of a synthetic fibre industry srf limited in malanpur, bhind (m.p.). The effluent treatment plant is treating 225 k...

slugA survey on Nanotechnology and Its Medical Applications

Nanotechnology is an advanced scientific technique that provides more accurate and timely medical information for diagnosing disease. Nanotechnology is a focal point in diabetes research, where nanoparticles in particul...

Empirical Relation between Capability Maturity & Significance of PPM Technique in Product Innovation of Manufacturing Industries

Project portfolio management (PPM) helps organizations to select and manage an optimal portfolio of products/ projects that maximizes organization's responsiveness, revenues, and adaptability while keeping the products/...

Download PDF file
  • EP ID EP22138
  • DOI -
  • Views 193
  • Downloads 3

How To Cite

P. Padmavathi, Mr. S. Dhanasekaran, Mr. A. Arockia Selvaraj (2016). A Scalable Approach To Detect A Duplicate Data Using PSNM And PB Algorithm. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 4(5), -. https://europub.co.uk/articles/-A-22138