Deriving the Probability with Machine Learning and Efficient Duplicate Detection in Hierarchical Objects

Journal Title: INTERNATIONAL JOURNAL OF COMPUTER TRENDS & TECHNOLOGY - Year 2014, Vol 7, Issue 2

Abstract

Duplicate detection is the major important task in the data mining, in order to find duplicate in the original data as well as data object. It exactly identifies whether the given data is duplicates or not. Real world duplicates are multiple representations that related to same real world data object. Detection of duplicates can performed to any places, it takes major important in database. To perform this hierarchical structure of duplicate detection in single relation is applied to XML data .In this work existing system presents a method called XMLDup. XMLDup uses a Bayesian network to establish the conditional probability between two XML elements being duplicates, taking into consideration not. Bayesian network based system conditional probability values are derived manually it becomes less efficient when compare to machine learning based results improves the efficiency of the duplicate detection proposed system finds the duplicate detection of XML data and XML Objects with different structure representation of the input files. Derive the conditional probability by applying Support vector machines (SVMs) models with associated learning algorithms that analyze XML Duplicate data object. In this method the number of XML Data is considered as input and the predicts the conditional probability value for each data in the hierarchical structure. Finally proposed SVM based classification performs better and efficient as well as effective duplicate detection.

Authors and Affiliations

D. Nithya , K. Karthickeyan

Keywords

Related Articles

A Survey of an Adaptive Weighted Spatio-Temporal Pyramid Matching For Video Retrieval

Recently, in the field of video analysis and retrieval Human action recognition in video is an important research and challenging topic. An efficient video retrieval is needed to search most similar and relevant video co...

Comparative Analysis of Edge Based Single Image Superresolution

Super-resolution image reconstruction provides an effective way to increase image resolution from a single or multiple low resolution images. There exists various single image super-resolution based on different assumpti...

Web Mining using Artificial Ant Colonies : A Survey

Web mining has been very crucial to any organization as it provides useful insights to business patterns. It helps the company to understand its customers better. As the web is growing in pace, so is its importance and h...

Analysis on Credit Card Fraud Detection Methods

Due to the theatrical increase of fraud which results in loss of dollars worldwide each year, several modern techniques in detecting fraud are persistently evolved and applied to many business fields. Fraud detection inv...

A Survey on an Effective Defense Mechanism against Reactive Jamming Attacks in WSN

A Wireless Sensor Network (WSN) is a self-configure network of sensor nodes communicate among themselves using radio signals and deployed in quantity to sense, monitor and to understand the physical world. A jammer is an...

Download PDF file
  • EP ID EP152024
  • DOI -
  • Views 143
  • Downloads 0

How To Cite

D. Nithya, K. Karthickeyan (2014). Deriving the Probability with Machine Learning and Efficient Duplicate Detection in Hierarchical Objects. INTERNATIONAL JOURNAL OF COMPUTER TRENDS & TECHNOLOGY, 7(2), 75-80. https://europub.co.uk/articles/-A-152024