An Efficient Approach towards Duplicate Detection System

Abstract

Information on the web is very huge in size and the tasks of search engines have become more and more complex as a single entity on the web have two or more representations in databases. The duplicate detection is the process of identifying the entities who has multiple representation of the same real world entity, as the duplicate detection methods has to process large datasets, the identification of duplicate document in a large database is a issue significantly with wide-spread applications. In this paper a review on various approaches of duplicate detection will be presented. Proposed system will compare two Duplication detection methods, the first is based on two novel progressive duplicate detection algorithms that significantly increase the efficiency of finding duplicates if the execution time is limited. The second is based on Secure Hashing Algorithm which will detect and delete duplicate data, the secure hash algorithm will perform data de-duplication task in order to overcome the issues of time and to reduce hash collision.

Authors and Affiliations

Miss. Ruchira Dhananjay Deshpande, Sonali Bodkhe

Keywords

Related Articles

Ethnobotanical Study of Wild Vegetables Used By Rural Communities of Satna District, Madhya Pradesh, India

The present paper deals wild vegetable plants was carried out during 2014-15 following standard ethnobotanical methods for documentation of underexploited, non-conventional, traditional and indigenous wild vegetables fo...

slugProviding PaaS Over Private Cloud For Educational System

Now a day’s, most of the Educational organizations suffering the problem of budget and academic data maintenance. Purchasing the licensed copy of software for each PC relatively increases budget of organization as well...

A Review –Tour Routing Problem

The objective of the study is Traveling salesman problem (TSP) is a typical combinatorial optimization problem and a NP problem in operations research. Ant colony algorithm (ACO) is a kind of probability technology used...

Controlling and Protection of Three Phase Induction Motor Using PLC

Automation is the process of handling various parameters of process like temperature, flow, etc. without presence of responsible person. In the development of automation controllers the trend has been to move towards so...

Comparison of Performance in Text Mining Using Text Categorization of Semi Structured Data

Text mining or knowledge discovery is that sub process of data mining, which is widely being used to discover hidden patterns and significant information from the huge amount of unstructured data. The enormous amount of...

Download PDF file
  • EP ID EP23038
  • DOI -
  • Views 297
  • Downloads 4

How To Cite

Miss. Ruchira Dhananjay Deshpande, Sonali Bodkhe (2017). An Efficient Approach towards Duplicate Detection System. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 5(1), -. https://europub.co.uk/articles/-A-23038