An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 2

Abstract

 Abstract: Data Mining is considered as a nontrival extraction of implicit, previously unknown and potentially usefull information from data. Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. A novel method for XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation a novel pruning strategy, capable of significant gains of the unoptimized version of the algorithm, is presented. Through experiments Bayesian Network proves that it can achieve high precision and recall scores in several data sets. XMLDup is also able to outperform another state-of-the-art duplicate detection solution, both in terms of upto 80% and of effectiveness.

Authors and Affiliations

A. Baladhandayutham, , S. Roselin Mary

Keywords

Related Articles

Segmentation of the Blood Vessel and Optic Disc in Retinal Images Using EM Algorithm

Abstract: Diabetic retinopathy (DR), glaucoma and hypertension are eye disease which is harmful and causes pressure in eye nerve and finally blindness. With the invention of new systems and the developing of newtechnolog...

 Password-oriented Image Encryption with multiple dependent factors.

 Abstract: The goal of this project was to develop a password-based image encryption algorithm that would be virtually safe from brute-force attacks, resulting in an image that would have no recognizable pattern. Th...

Intent based Image Ranking for Web Search Reranking

Abstract: New approach is presented for reranking of the images. While searching images on the web to increase the accuracy of the image search result Image search reranking is used. Image search reranking is aneffective...

 Postgraduate Historical Research Format With Reference To Sudan and Nigerian Methods: A Comparative Perspective

  This study titled “Postgraduate Historical Research format with reference to Sudan and Nigerian Methods” A comparative perspective has been established to justify historical research procedures, examine the concep...

 Security in Body Sensor Networks for Healthcare applications

 This paper offers a depth review of numerous Wireless Sensor/detector Systems. Healthcare applications are considered as talented fields for Wireless Sensor Networks, where patients can be watchedusing wireless me...

Download PDF file
  • EP ID EP142076
  • DOI 10.9790/0661-1628101105
  • Views 136
  • Downloads 0

How To Cite

A. Baladhandayutham, , S. Roselin Mary (2014).  An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data. IOSR Journals (IOSR Journal of Computer Engineering), 16(2), 101-105. https://europub.co.uk/articles/-A-142076