An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 2

Abstract

 Abstract: Data Mining is considered as a nontrival extraction of implicit, previously unknown and potentially usefull information from data. Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. A novel method for XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation a novel pruning strategy, capable of significant gains of the unoptimized version of the algorithm, is presented. Through experiments Bayesian Network proves that it can achieve high precision and recall scores in several data sets. XMLDup is also able to outperform another state-of-the-art duplicate detection solution, both in terms of upto 80% and of effectiveness.

Authors and Affiliations

A. Baladhandayutham, , S. Roselin Mary

Keywords

Related Articles

Detection and Prevention of Selfish Attack in MANET using Dynamic Learning

Abstract: In this paper we deal with misbehaving nodes in mobile ad hoc networks (MANETs) that drop packets supposed to be relayed, whose purpose may be either saving their resources or launching a DoS attack. We propose...

 Sentiment Features based Analysis of Online Reviews

Abstract : Sentiment Analysis (SA) and Summarization is a new and emerging field of research which dealswith information extraction and knowledge discovery from text using Natural Language Processing and DataMining techn...

 Towards Web 3.0: An Application Oriented Approach

 The World Wide Web (WWW) is global information medium, where users can read and write using computers over internet. Web is one of the services available on internet. The Web was created in 1989 by Sir  Tim...

 Language Mobile Learning Design: The Tamazight Language

 Even though Tamazight language in Morocco has been legitimate for eleven years and official for almost two years, its learning needs a huge effort to successfully make it in this enormous batch of existing unformat...

 A Self-Organizing Cooperation for Autonomous Agents Based onImmune Network

 Abstract: In order to solve the cooperation of autonomous agent systems,an artificial immune networkcooperation algorithm is proposed,which is based on Jerne’s idiotypic immune network hypothesis. A sheepdogherdi...

Download PDF file
  • EP ID EP142076
  • DOI 10.9790/0661-1628101105
  • Views 131
  • Downloads 0

How To Cite

A. Baladhandayutham, , S. Roselin Mary (2014).  An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data. IOSR Journals (IOSR Journal of Computer Engineering), 16(2), 101-105. https://europub.co.uk/articles/-A-142076