Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes

Abstract

An approach for named entity classification based on Wikipedia article infoboxes is described in this paper. It identifies the three fundamental named entity types, namely; Person, Location and Organization. An entity classification is accomplished by matching entity attributes extracted from the relevant entity article infobox against core entity attributes built from Wikipedia Infobox Templates. Experimental results showed that the classifier can achieve a high accuracy and F-measure scores of 97%. Based on this approach, a database of around 1.6 million 3-typed named entities is created from 20140203 Wikipedia dump. Experiments on CoNLL2003 shared task named entity recognition (NER) dataset disclosed the system’s outstanding performance in comparison to three different state-of-the-art systems.

Authors and Affiliations

Muhidin Mohamed, Mourad Oussalah

Keywords

Related Articles

Linking Context to Data Warehouse Design

Data warehouses are now widely used for analysis and decision support purposes. The availability of software solutions, which are more and more user-friendly and easy to manipulate has made it possible to extend their us...

Implementing a Safe Travelling Technique to Avoid the Collision of Animals and Vehicles in Saudi Arabia

In this work, a safe travelling technique was proposed and implemented a LoRa based application to avoid the collision of animals with vehicles on the highways of Saudi Arabia. For the last few decades, it has been a gre...

Detection of Chronic Kidney Disease using Machine Learning Algorithms with Least Number of Predictors

Chronic kidney disease (CKD) is one of the most critical health problems due to its increasing prevalence. In this paper, we aim to test the ability of machine learning algorithms for the prediction of chronic kidney dis...

Frequency Estimation of Single-Tone Sinusoids Under Additive and Phase Noise

We investigate the performance of main frequency estimation methods for a single-component complex sinusoid under complex additive white Gaussian noise (AWGN) as well as phase noise (PN). Two methods are under test: Maxi...

The Respondent’s Haptic on Academic Universities Websites of Pakistan Measuring Usability

This study based on survey, by using four higher educational (Universities) websites were selected for the usability testing with the help of response to the experience of eighty students of same age group and investigat...

Download PDF file
  • EP ID EP147492
  • DOI 10.14569/IJACSA.2014.050725
  • Views 88
  • Downloads 0

How To Cite

Muhidin Mohamed, Mourad Oussalah (2014). Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes. International Journal of Advanced Computer Science & Applications, 5(7), 164-169. https://europub.co.uk/articles/-A-147492