An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 2

Abstract

 Abstract: Data Mining is considered as a nontrival extraction of implicit, previously unknown and potentially usefull information from data. Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. A novel method for XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation a novel pruning strategy, capable of significant gains of the unoptimized version of the algorithm, is presented. Through experiments Bayesian Network proves that it can achieve high precision and recall scores in several data sets. XMLDup is also able to outperform another state-of-the-art duplicate detection solution, both in terms of upto 80% and of effectiveness.

Authors and Affiliations

A. Baladhandayutham, , S. Roselin Mary

Keywords

Related Articles

 Improving Sales in SME Using Internet Marketing

 In Indonesia, SMEs are the backbone of the Indonesian economy. Number of SMEs until 2011 to reach around 52 million. SMEs in Indonesia is very important for the economy because it accounts for 60% of  GDP an...

 An Intelligence System for Detection of Cancer and Diagnosis

 Abstract: Currently the digital images are used in various areas like medical, fashion, architecture, face recognition, finger print recognition and bio metrics. Recently the CBIR reduced the semantic gap between t...

An Improved Genetic Algorithm Based On Adaptive Differential Evolution

Abstract: In order to solve the premature convergence and improve the optimization ability of simple genetic algorithm (SGA) in the complex function optimization, an improved differential evolution based-genetic algorith...

 Quality of Service Optimization in Realm of Green Monitoring using Broad Area Sensor Network (BASN)

 A Wireless Sensor Network(WSN) contains short range energy limited terminals/nodes in which multiple nodes participates one by one to transfer data from source node to Base station. Each node appends  some a...

Download PDF file
  • EP ID EP142076
  • DOI 10.9790/0661-1628101105
  • Views 111
  • Downloads 0

How To Cite

A. Baladhandayutham, , S. Roselin Mary (2014).  An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data. IOSR Journals (IOSR Journal of Computer Engineering), 16(2), 101-105. https://europub.co.uk/articles/-A-142076