An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data
Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 2
Abstract
Abstract: Data Mining is considered as a nontrival extraction of implicit, previously unknown and potentially usefull information from data. Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. A novel method for XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation a novel pruning strategy, capable of significant gains of the unoptimized version of the algorithm, is presented. Through experiments Bayesian Network proves that it can achieve high precision and recall scores in several data sets. XMLDup is also able to outperform another state-of-the-art duplicate detection solution, both in terms of upto 80% and of effectiveness.
Authors and Affiliations
A. Baladhandayutham, , S. Roselin Mary
Energy Efficient Geographic Adaptive Fidelity in Wireless Sensor Networks
Abstract: Wireless sensor network (WSN) is a quickly developing and existing research that has pulled in impressive exploration consideration in the later past. Routing is to figure out the way to send the detected infor...
Scheduling Using Multi Objective Genetic Algorithm
Abstract : Multiprocessor task scheduling is considered to be the most important and very difficult issue. Taskscheduling is performed to match the resource requirement of the job with the available resources resulting i...
Modern Computer Implementation on Smart Phone withAndroid Platform for Smes (UMKM) in Optimization ServicesDistrict Malang
Abstract: The use of Android in the smartphone operating system currently used by many companies. Because of its superiority as a software that uses computer code base that can be distributed openly (open source) so many...
Securing IPv6’s Neighbour and Router Discovery, using Locally Authentication Process
today’s world.Internet Engineering Task Force (IETF), in IPv6, allowed nodes to Auto configure using neighbour discovery protocol. Neighbour Discovery (ND) and Address auto-configuration mechanisms may be protect...
Anti-spam Filter Based on Machine Learning Algorithm
We building one of the filter which help the user relief from the unwanted mails in his inbox of the mail account which is develop using the machine learning algorithm and which also contain the filter.In this we c...