DBpedia based Ontological Concepts Driven Information Extraction from Unstructured Text

Abstract

In this paper a knowledge base concept driven named entity recognition (NER) approach is presented. The technique is used for information extraction from news articles and linking it with background concepts in knowledge base. The work specifically focuses on extracting entity mentions from unstructured articles. The extraction of entity mentions from articles is based on the existing concepts from DBPedia ontology, representing the knowledge associated with the concepts present in Wikipedia knowledge base. A collection of the Wikipedia concepts through structured DBpedia ontology has been extracted and developed. For processing of unstructured text, Dawn news articles have been scrapped, preprocessed and thereby a corpus has been built. The proposed knowledge base driven system shows that given an article, the system identifies the entity mentions in the text article and how they can automatically be linked with the concepts to the corresponding entity mentions representing their respective pages on Wikipedia. The system is evaluated on three test collections of news articles on politics, sports and entertainment domains. The experimental results in respect of entity mentions are reported. The results are presented as precision, recall and f-measure, where the precision of extraction of relevant entity mentions identified yields the best results with a little variation in percent recall and f-measures. Additionally, facts associated with the extracted entity mentions both in form of sentences and Resource Description Framework (RDF) triples are presented so as to enhance the user’s understanding of the related facts presented in the article.

Authors and Affiliations

Adeel Ahmed, Syed Saif ur Rahman

Keywords

Related Articles

A Fresnelet-Based Encryption of Medical Images using Arnold Transform

Medical images are commonly stored in digital media and transmitted via Internet for certain uses. If a medical information image alters, this can lead to a wrong diagnosis which may create a serious health problem. More...

Low Error Floor Concatenated LDPC for MIMO Systems

Multiple-Input and Multiple-Output, or MIMO is the use of multiple antennas at both the transmitter and receiver to improve communication performance. MIMO technology has attracted attention in wireless communications; b...

A Method for Designing Domain-Specific Document Retrieval Systems using Semantic Indexing

Using domain knowledge and semantics to con-duct e‡ective document retrieval has attracted great attention from researchers in many di‡erent communities. Ultilizing that approach, we presents the method for designing dom...

A Proposed Framework for Generating Random Objective Exams using Paragraphs of Electronic Courses

Objective exams (OE) plays a major role in educational assessment as well as in electronic learning. The main problem in the traditional system of exams is a low quality of questions caused by some human factors, such as...

SW-SDF Based Personal Privacy with QIDB-Anonymization Method

Personalized anonymization is a method in which a guarding node is used to indicate whether the record owner is ready to reveal its sensitivity based on which anonymization will be performed. Most of the sensitive values...

Download PDF file
  • EP ID EP261397
  • DOI 10.14569/IJACSA.2017.080954
  • Views 120
  • Downloads 0

How To Cite

Adeel Ahmed, Syed Saif ur Rahman (2017). DBpedia based Ontological Concepts Driven Information Extraction from Unstructured Text. International Journal of Advanced Computer Science & Applications, 8(9), 411-418. https://europub.co.uk/articles/-A-261397