A Method for Implementing Probabilistic Entity Resolution

Abstract

Deterministic and probabilistic are two approaches to matching commonly used in Entity Resolution (ER) systems. While many users are familiar with writing and using Boolean rules for deterministic matching, fewer are as familiar with the scoring rule configuration used to support probabilistic matching. This paper describes a method using deterministic matching to “bootstrap” probabilistic matching. It also examines the effectiveness three commonly used strategies to mitigate the effect of missing values when using probabilistic matching. The results based on experiment using different sets of synthetically generated data processed using the OYSTER open source entity resolution system.

Authors and Affiliations

Awaad Alsarkhi, John R. Talburt

Keywords

Related Articles

Measuring the Effect of Packet Corruption Ratio on Quality of Experience (QoE) in Video Streaming

The volume of Internet video traffic which consists of downloaded or streamed video from the Internet is projected to increase from 42,029PB monthly in 2016 to 159,161PB monthly, in 2021, representing a 31% increase in t...

Towards a New Approach to Improve the Classification Accuracy of the Kohonen’s Self-Organizing Map During Learning Process

Kohonen self-organization algorithm, known as “topologic maps algorithm”, has been largely used in many applications for classification. However, few theoretical studies have been proposed to improve and optimize the lea...

Influence of Adopting a Text-Free User Interface on the Usability of a Web-based Government System with Illiterate and Semi-Literate People

Illiterate and semi-literate people usually face different types of difficulties when they use the Internet, such as reading and recognising text. This research aims to develop and examine the influence of adopting a tex...

Developing a Dengue Forecasting Model: A Case Study in Iligan City

Dengue is a viral mosquito-borne infection that is endemic and has become a major public health concern in the Philippines. Cases of dengue in the country have been recorded to be increasing, however, it is reported that...

Multi-Band and Multi-Parameter Reconfigurable Slotted Patch Antenna with Embedded Biasing Network

RF PIN diodes are used to achieve reconfigurability in frequency, polarization, and radiation pattern. The antenna can be used in different bands by controlling ON and OFF states of two PIN diodes using the embedded bias...

Download PDF file
  • EP ID EP417568
  • DOI 10.14569/IJACSA.2018.091102
  • Views 77
  • Downloads 0

How To Cite

Awaad Alsarkhi, John R. Talburt (2018). A Method for Implementing Probabilistic Entity Resolution. International Journal of Advanced Computer Science & Applications, 9(11), 7-15. https://europub.co.uk/articles/-A-417568