Deriving the Probability with Machine Learning and Efficient Duplicate Detection in Hierarchical Objects

Journal Title: INTERNATIONAL JOURNAL OF COMPUTER TRENDS & TECHNOLOGY - Year 2014, Vol 7, Issue 2

Abstract

Duplicate detection is the major important task in the data mining, in order to find duplicate in the original data as well as data object. It exactly identifies whether the given data is duplicates or not. Real world duplicates are multiple representations that related to same real world data object. Detection of duplicates can performed to any places, it takes major important in database. To perform this hierarchical structure of duplicate detection in single relation is applied to XML data .In this work existing system presents a method called XMLDup. XMLDup uses a Bayesian network to establish the conditional probability between two XML elements being duplicates, taking into consideration not. Bayesian network based system conditional probability values are derived manually it becomes less efficient when compare to machine learning based results improves the efficiency of the duplicate detection proposed system finds the duplicate detection of XML data and XML Objects with different structure representation of the input files. Derive the conditional probability by applying Support vector machines (SVMs) models with associated learning algorithms that analyze XML Duplicate data object. In this method the number of XML Data is considered as input and the predicts the conditional probability value for each data in the hierarchical structure. Finally proposed SVM based classification performs better and efficient as well as effective duplicate detection.

Authors and Affiliations

D. Nithya , K. Karthickeyan

Keywords

Related Articles

Database Based Validation of Union of Two Multigranular Rough Sets

Most of the traditional tools for undertaking modeling, reasoning and other computing are found not only crisp but also highly deterministic and more precise in character which usually limits their applicability in real...

Classification of Diabetes Mellitus using Modified Particle Swarm Optimization and Least Squares Support Vector Machine

Diabetes Mellitus is a major health problem all over the world. Many classification algorithms have been applied for its diagnoses and treatment. In this paper, a hybrid algorithm of Modified-Particle Swarm Optimization...

Link Prediction in Protein-Protein Networks: Survey

Protein networks have a great importance in biological activities. Protein-Protein interaction occurs when two or more proteins interact together to carry out some biological activities. For example signals from the exte...

An Efficient Classification Approach for Novel Class Detection by Evolving Feature Datastreams

Data stream classification has been an extensively studied research problem in recent years. data stream classification requires efficient and effective techniques that are significantly different from static data classi...

Intensity and Texture Gradient Based Boundary Detection Algorithm for Medical Image[i][/i]

In today’s technological world images plays a very important role. Images are very helpful in several fields like non-destructive testing. Medical diagnostics, surveillance and military etc. in image analysis and compute...

Download PDF file
  • EP ID EP152024
  • DOI -
  • Views 131
  • Downloads 0

How To Cite

D. Nithya, K. Karthickeyan (2014). Deriving the Probability with Machine Learning and Efficient Duplicate Detection in Hierarchical Objects. INTERNATIONAL JOURNAL OF COMPUTER TRENDS & TECHNOLOGY, 7(2), 75-80. https://europub.co.uk/articles/-A-152024