Automatic Keyphrase Extractor from Arabic Documents

Abstract

The keyphrase is a sentence or a part of a sentence that contains a sequence of words that expresses the meaning and the purpose of any given paragraph. Keyphrase extraction is the task of identifying the possible keyphrases from a given document. Many applications including text summarization, indexing, and characterization use keyphrase extraction. Also, it is an essential task to improve the performance of any information retrieval system. The internet contains a massive amount of documents that may have been manually assigned keyphrases or not. The Arabic language is an important language in the world. Nowadays the number of online Arabic documents is growing rapidly; and most of them have no manually assigned keyphrases, so the user will scan the whole retrieved web documents. To avoid scanning the entire retrieved document, we need keyphrases assigned to each web document manually or automatically. This paper addresses the problem of identifying keyphrases in Arabic documents automatically. In this work, we provide a novel algorithm that identified keyphrases from Arabic text. The new algorithm, Automatic Keyphrases Extraction from Arabic (AKEA), extracts keyphrases from Arabic documents automatically. In order to test the algorithm, we collected a dataset containing 100 documents from Arabic wiki; also, we downloaded another 56 agricultural documents from Food and Agricultural Organization of the United Nations (F.A.O.). The evaluation results show that the system achieves 83% precision value in identifying 2-word and 3-word keyphrases from agricultural domains.

Authors and Affiliations

Hassan Najadat, Ismail Hmeidi, Mohammed Al-Kabi, Maysa Bany Issa

Keywords

Related Articles

A Review of Computation Solutions by Mobile Agents in an Unsafe Environment

Exploration in an unsafe environment is one of the major problems that can be seen as a basic block for many distributed mobile protocols. In such environment we consider that either the nodes (hosts) or the agents can p...

Comparative Performance Analysis for Generalized Additive and Generalized Linear Modeling in Epidemiology

Most environmental-epidemiological researches emphasize modeling as the causal link of different events (e.g., hospital admission, death, disease emergency). There has been a particular concern in the use of the Generali...

Fault Injection and Test Approach for Behavioural Verilog Designs using the Proposed RASP-FIT Tool

Soft-core processors and complex Field Pro-grammable Gate Array (FPGA) designs are described as an algorithmic manner, i.e. behavioural abstraction level in Hard-ware Description Languages (HDL). Lower abstraction levels...

Performance Enhancement of Scheduling Algorithm in Heterogeneous Distributed Computing Systems

Efficient task scheduling is essential for obtaining high performance in heterogeneous distributed computing systems. Some algorithms have been proposed for both homogeneous and heterogeneous distributed computing system...

Software Architecture Quality Measurement Stability and Understandability

Over the past years software architecture has become an important sub-field of software engineering. There has been substantial advancement in developing new technical approaches to start handling architectural design as...

Download PDF file
  • EP ID EP112234
  • DOI 10.14569/IJACSA.2016.070226
  • Views 91
  • Downloads 0

How To Cite

Hassan Najadat, Ismail Hmeidi, Mohammed Al-Kabi, Maysa Bany Issa (2016). Automatic Keyphrase Extractor from Arabic Documents. International Journal of Advanced Computer Science & Applications, 7(2), 192-199. https://europub.co.uk/articles/-A-112234