Parallel and Multiple E-Data Distributed Process with Progressive Duplicate Detection Model

Abstract

In present, duplicate detection methods need to process ever larger datasets in ever shorter time: It is difficult to maintain the dataset. This project presents progressive duplicate detection algorithm that gradually increase the efficiency of finding duplicates if the execution time is limited: They maximize the gain of the overall process within the available time by reporting most results. These experiments show that progressive algorithms can double the efficiency over time of traditional duplicate detection and improve the work. Progressive duplicate detection identifies most duplicate pairs in the detection process. Instead of reducing the overall time needed to finish the entire process, this approaches tries to reduce the average time.

Authors and Affiliations

Yasvanthkumaar V, Sabitha S, Nithya Kalyani S.

Keywords

--

Related Articles

Heuristics Approach for Analyzing the Geo-Distributed Data

Big data analytic is and cloud service for analysis useful information. Traditionally, data sets are stored and processed in a single data center. As the amount of data grows at a high rate, using of one data centre is l...

A Study of Data Storage Security Issues in Cloud Computing

Cloud computing provides on demand services to its purchasers. Knowledge storage is among one in every of the first services provided by cloud computing. Cloud service supplier hosts the information of knowledge owner on...

Topic Categorization on Social Network Using Latent Dirichlet Allocation

Topic modelling is a powerful technique for analysis of large document collection. Topic modelling is used for finding hidden topic from the collection of document. In the twitter api, it is essential all the tweet docum...

Synergy of Classical and Model-Based Object-Oriented (OO) Metrics in Reducing Test Costs 

Software testing and maintenance being interleaved phases span more in software life cycle. The efforts to minimize this span rely obviously on testing when maintenance is natural. The features of Object-Oriented (OO) so...

Download PDF file
  • EP ID EP404985
  • DOI 10.9756/BIJSESC.8384
  • Views 108
  • Downloads 0

How To Cite

Yasvanthkumaar V, Sabitha S, Nithya Kalyani S. (2018). Parallel and Multiple E-Data Distributed Process with Progressive Duplicate Detection Model. Bonfring International Journal of Software Engineering and Soft Computing, 8(1), 23-25. https://europub.co.uk/articles/-A-404985