Deriving the Probability with Machine Learning and Efficient Duplicate Detection in Hierarchical Objects

Journal Title: INTERNATIONAL JOURNAL OF COMPUTER TRENDS & TECHNOLOGY - Year 2014, Vol 7, Issue 2

Abstract

Duplicate detection is the major important task in the data mining, in order to find duplicate in the original data as well as data object. It exactly identifies whether the given data is duplicates or not. Real world duplicates are multiple representations that related to same real world data object. Detection of duplicates can performed to any places, it takes major important in database. To perform this hierarchical structure of duplicate detection in single relation is applied to XML data .In this work existing system presents a method called XMLDup. XMLDup uses a Bayesian network to establish the conditional probability between two XML elements being duplicates, taking into consideration not. Bayesian network based system conditional probability values are derived manually it becomes less efficient when compare to machine learning based results improves the efficiency of the duplicate detection proposed system finds the duplicate detection of XML data and XML Objects with different structure representation of the input files. Derive the conditional probability by applying Support vector machines (SVMs) models with associated learning algorithms that analyze XML Duplicate data object. In this method the number of XML Data is considered as input and the predicts the conditional probability value for each data in the hierarchical structure. Finally proposed SVM based classification performs better and efficient as well as effective duplicate detection.

Authors and Affiliations

D. Nithya , K. Karthickeyan

Keywords

Related Articles

An Approach for Load Balancing Among Multi-Agents to Protect Cloud Against DDos Attack

Cloud Computing is widely used technology in present era. Cloud Computing is mainly used for on demand services over the distributed servers. So it is necessary to manage the working load of participating servers for uni...

A Smart Intelligent Way of Video Authentication Using Classification and Decomposition of Watermarking Methods

Video Watermarking serves as a new technology mainly used to provide security to the illegal distribution of digital video over the web. The purpose of any video watermarking scheme is to embed extra information into vid...

Web Content Classification: A Survey

As the information contained within the web is increasing day by day, organizing this information could be a necessary requirement.The data mining process is to extract information from a data set and transform it into a...

Privacy Preservation using Shamir’s Secrete Sharing Algorithm for Data Storage Security

The Cloud computing is a latest technology which provides various services through internet. The Cloud server allows user to store their data on a cloud without worrying about correctness & integrity of data. Cloud d...

Information Security and Risk Management for Banking System

Risk management provides an effective approach for measuring the security, but existing risk management approaches come with major shortcomings such as the demand for a very detailed knowledge about the IT security spher...

Download PDF file
  • EP ID EP152024
  • DOI -
  • Views 156
  • Downloads 0

How To Cite

D. Nithya, K. Karthickeyan (2014). Deriving the Probability with Machine Learning and Efficient Duplicate Detection in Hierarchical Objects. INTERNATIONAL JOURNAL OF COMPUTER TRENDS & TECHNOLOGY, 7(2), 75-80. https://europub.co.uk/articles/-A-152024