Automatic Keyphrase Extractor from Arabic Documents

Abstract

The keyphrase is a sentence or a part of a sentence that contains a sequence of words that expresses the meaning and the purpose of any given paragraph. Keyphrase extraction is the task of identifying the possible keyphrases from a given document. Many applications including text summarization, indexing, and characterization use keyphrase extraction. Also, it is an essential task to improve the performance of any information retrieval system. The internet contains a massive amount of documents that may have been manually assigned keyphrases or not. The Arabic language is an important language in the world. Nowadays the number of online Arabic documents is growing rapidly; and most of them have no manually assigned keyphrases, so the user will scan the whole retrieved web documents. To avoid scanning the entire retrieved document, we need keyphrases assigned to each web document manually or automatically. This paper addresses the problem of identifying keyphrases in Arabic documents automatically. In this work, we provide a novel algorithm that identified keyphrases from Arabic text. The new algorithm, Automatic Keyphrases Extraction from Arabic (AKEA), extracts keyphrases from Arabic documents automatically. In order to test the algorithm, we collected a dataset containing 100 documents from Arabic wiki; also, we downloaded another 56 agricultural documents from Food and Agricultural Organization of the United Nations (F.A.O.). The evaluation results show that the system achieves 83% precision value in identifying 2-word and 3-word keyphrases from agricultural domains.

Authors and Affiliations

Hassan Najadat, Ismail Hmeidi, Mohammed Al-Kabi, Maysa Bany Issa

Keywords

Related Articles

Selection of Important Sets by using K-Skyband Query for Sets

In this paper, we consider “sets” selection problem from a database. In conventional selection problem, which is “objects” selection problem, the skyline query has been utilized, since it can retrieve a set of important...

Person Detection from Overhead View: A Survey

In recent years, overhead view based person detection gained importance, due to handling occlusion problem and providing better coverage in scene, as com-pared to frontal view. In computer vision, overhead based person d...

A Generic Methodology for Clustering to Maximises Inter-Cluster Inertia

This paper proposes a novel clustering methodology which undeniably manages to offer results with a higher inter-cluster inertia for a better clustering. The advantage obtained with this methodology is due to an algorith...

Spontaneous-braking and lane-changing effect on traffic congestion using cellular automata model applied to the two-lane traffic

In the real traffic situations, vehicle would make a braking as the response to avoid collision with another vehicle or avoid some obstacle like potholes, snow, or pedestrian that crosses the road unexpectedly. However,...

Breast Cancer Classification using Global Discriminate Features in Mammographic Images

Breast cancer has become a rapidly prevailing disease among women all over the world. In term of mortality, it is considered to be the second leading cause of death. Death risk can be reduced by early stage detection, f...

Download PDF file
  • EP ID EP112234
  • DOI 10.14569/IJACSA.2016.070226
  • Views 112
  • Downloads 0

How To Cite

Hassan Najadat, Ismail Hmeidi, Mohammed Al-Kabi, Maysa Bany Issa (2016). Automatic Keyphrase Extractor from Arabic Documents. International Journal of Advanced Computer Science & Applications, 7(2), 192-199. https://europub.co.uk/articles/-A-112234