Deriving the Probability with Machine Learning and Efficient Duplicate Detection in Hierarchical Objects

Journal Title: INTERNATIONAL JOURNAL OF COMPUTER TRENDS & TECHNOLOGY - Year 2014, Vol 7, Issue 2

Abstract

Duplicate detection is the major important task in the data mining, in order to find duplicate in the original data as well as data object. It exactly identifies whether the given data is duplicates or not. Real world duplicates are multiple representations that related to same real world data object. Detection of duplicates can performed to any places, it takes major important in database. To perform this hierarchical structure of duplicate detection in single relation is applied to XML data .In this work existing system presents a method called XMLDup. XMLDup uses a Bayesian network to establish the conditional probability between two XML elements being duplicates, taking into consideration not. Bayesian network based system conditional probability values are derived manually it becomes less efficient when compare to machine learning based results improves the efficiency of the duplicate detection proposed system finds the duplicate detection of XML data and XML Objects with different structure representation of the input files. Derive the conditional probability by applying Support vector machines (SVMs) models with associated learning algorithms that analyze XML Duplicate data object. In this method the number of XML Data is considered as input and the predicts the conditional probability value for each data in the hierarchical structure. Finally proposed SVM based classification performs better and efficient as well as effective duplicate detection.

Authors and Affiliations

D. Nithya , K. Karthickeyan

Keywords

Related Articles

Cloud Computing: A Strategy to Improve the Economy of Islamic Societies

Cloud computing is the new technology that significantly change the landscape of businesses as it enables users, companies, and governments to store information in multiple servers and allows on-demand access. The cloud...

Security in MANET Against DDoS Attack

Mobile ad-hoc network is a group of two or more devices or nodes with the capability of  communication and networking. It is an infrastructure less network. Such network may operate by them or may be connected to a...

Tight Bounds on SINR with ZFBF and Feedback

The concept of Multiple Input Multiple Output (MIMO) is an advanced one in the field of wireless communications. The main objective behind the MIMO is providing high data rates to multiple users at a time. MIMO also aims...

A Comparative study Between Fuzzy Clustering Algorithm and Hard Clustering Algorithm

Data clustering is an important area of data mining. This is an unsupervised study where data of similar types are put into one cluster while data of another types are put into different cluster. Fuzzy C means is a very...

Download PDF file
  • EP ID EP152024
  • DOI -
  • Views 98
  • Downloads 0

How To Cite

D. Nithya, K. Karthickeyan (2014). Deriving the Probability with Machine Learning and Efficient Duplicate Detection in Hierarchical Objects. INTERNATIONAL JOURNAL OF COMPUTER TRENDS & TECHNOLOGY, 7(2), 75-80. https://europub.co.uk/articles/-A-152024