Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes - Europub

Search

Apply

Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes

Journal Title: International Journal of Advanced Computer Science & Applications - Year 2014, Vol 5, Issue 7

Abstract

An approach for named entity classification based on Wikipedia article infoboxes is described in this paper. It identifies the three fundamental named entity types, namely; Person, Location and Organization. An entity classification is accomplished by matching entity attributes extracted from the relevant entity article infobox against core entity attributes built from Wikipedia Infobox Templates. Experimental results showed that the classifier can achieve a high accuracy and F-measure scores of 97%. Based on this approach, a database of around 1.6 million 3-typed named entities is created from 20140203 Wikipedia dump. Experiments on CoNLL2003 shared task named entity recognition (NER) dataset disclosed the system’s outstanding performance in comparison to three different state-of-the-art systems.

Authors and Affiliations

Muhidin Mohamed, Mourad Oussalah

Keywords

named entity identification Wikipedia infobox infobox templates Named Entity Classification (NEC)

Related Articles

Developing Deep Learning Models to Simulate Human Declarative Episodic Memory Storage

Human like visual and auditory sensory devices became very popular in recent years through the work of deep learning models that incorporate aspects of brain processing such as edge and line detectors found in the visua...

Cloud Server Security using Bio-Cryptography

Data security is becoming more important in cloud computing. Biometrics is a computerized method of identifying a person based on a physiological characteristic. Among the features measured are our face, fingerprints, ha...

A Circular Polarization RFID Tag for Medical Uses

The aim of this paper is to present Radio Frequency Identification (RFID) Tag. The use of this kind of antennas in the medical field has a great importance in making people's life easier and improving the way to get medi...

Mitigation of Cascading Failures with Link Weight Control

Cascading failures are crucial issues for the study of survivability and resilience of our infrastructures and have attracted much interest in complex networks research. In this paper, we study the overload-based cascadi...

A semantic cache for enhancing Web services communities activities: Health care case Study

Collective memories are strong support for enhancing the activities of capitalization, management and dissemination inside a Web services community. To take advantages of collective memory, we propose an approach for ind...

Download PDF file

EP ID EP147492
DOI 10.14569/IJACSA.2014.050725
Views 102
Downloads 0