Record Matching Over Query Results Using Fuzzy Ontological Document Clustering

Journal Title: International Journal on Computer Science and Engineering - Year 2011, Vol 3, Issue 2

Abstract

Record matching is an essential step in duplicate detection as it identifies records representing same real-world entity. Supervised record matching methods require users to provide training data and therefore cannot be applied for web databases where query results are generated on-the-fly. To overcome the problem, a new record matching method named Unsupervised Duplicate Elimination (UDE) isproposed for identifying and eliminating duplicates among records in dynamic query results. The idea of this paper is to adjust the weights of record fields in calculating similarities among records. Two classifiers namely weight component similarity summing classifier, support vector machine classifier are iteratively employed with UDE where the first classifier utilizes the weights set to match records from different data sources. With the matched records as positive dataset and non uplicate records as negative set, the second classifier identifies new duplicates. Then, a new methodology to automatically interpret and cluster knowledge documents using an ontology schema is presented. Moreover, a fuzzy logic control approach is used to match suitable document cluster(s) for given patents based on their derived ontological semantic webs. Thus, this paper takes advantage of similarity among records from web databases and solves the online duplicate detection problem.

Authors and Affiliations

V. Vijayaraja , R. Prasanna Kumar , M. A. Mukunthan , G. Bharathi Mohan

Keywords

Related Articles

CELLULAR AUTOMATA AND WATERMARING FOR IMAGE COPYRIGHT PROTECTION

In image processing there is some copyright protection and techniques are available. Digital Watermarking is the technique for copyright protection of data (image).Cellular automata is successfully applied in image proce...

iImplementation of AMBA AHB protocol for high capacity memory management using VHDL

Microprocessor performance has improved rapidly these years. In contrast memory latencies and bandwidths have improved little. The result is that the memory access time is the bottleneck which limits the system performan...

Comparative study of Attacks on AODV-based Mobile Ad Hoc Networks

In recent years, the use of mobile ad hoc networks (MANETs) has been widespread in many applications, The lack of infrastructures in MANETs makes the detection and control of security hazards all the more difficult. The...

Frequent Data Itemset Mining Using VS_Apriori Algorithms

The organization, management and accessing of information in better manner in various data warehouse applications have been active areas of research for many researchers for more han last two decades. The work resented i...

Two Factor Biometric Key for Secure Wireless Networks

The applications of wireless networks is steadily increasing through out the world. Wireless transactions are now appening in highly secure banking networks. To have more reliable networks, security of wireless networks...

Download PDF file
  • EP ID EP97303
  • DOI -
  • Views 125
  • Downloads 0

How To Cite

V. Vijayaraja, R. Prasanna Kumar, M. A. Mukunthan, G. Bharathi Mohan (2011). Record Matching Over Query Results Using Fuzzy Ontological Document Clustering. International Journal on Computer Science and Engineering, 3(2), 926-932. https://europub.co.uk/articles/-A-97303