An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 2

Abstract

 Abstract: Data Mining is considered as a nontrival extraction of implicit, previously unknown and potentially usefull information from data. Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. A novel method for XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation a novel pruning strategy, capable of significant gains of the unoptimized version of the algorithm, is presented. Through experiments Bayesian Network proves that it can achieve high precision and recall scores in several data sets. XMLDup is also able to outperform another state-of-the-art duplicate detection solution, both in terms of upto 80% and of effectiveness.

Authors and Affiliations

A. Baladhandayutham, , S. Roselin Mary

Keywords

Related Articles

 Protocols for detection of node replication attack  on wireless sensor network

 Wireless sensor network has many small sensor nodes that work in collaborative manner to achieve a specific task. But it is deployed in unattended environment and that is why it is prone to attacks. These  a...

Monitoring Road Accidents using Sensors and providing Medical Facilities 

 The main objective of this paper is to detect an accident in which immediately help is required to driver and driver is not in position to inform any medical rescue team. In this kind of situation there is a need...

 The Cyberspace: Redefining A New World

Abstract: The cyberspace driven by information systems and the Internet is transforming our environment inextraordinary ways by enabling economic growth and providing new means by which people connect, interactand collab...

 A Survey on Vehicular Ad hoc Networks

 Vehicular Ad hoc Networks (VANETs), a subclass of mobile ad hoc network (MANET), is a promising approach for the intelligent transport system (ITS). VANET allows vehicles to form a self-organized  network wi...

 Dimensionality Reduction Evolution and Validation

Abstract: In this paper, proposing visualized and quantitative evaluation methods for validation dimensionalityreduction techniques performance. Four well known techniques for dimensionality reduction evaluated, verifyth...

Download PDF file
  • EP ID EP142076
  • DOI 10.9790/0661-1628101105
  • Views 91
  • Downloads 0

How To Cite

A. Baladhandayutham, , S. Roselin Mary (2014).  An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data. IOSR Journals (IOSR Journal of Computer Engineering), 16(2), 101-105. https://europub.co.uk/articles/-A-142076