A Novel Rule-Based Root Extraction Algorithm for Arabic Language

Abstract

Non-vocalized Arabic words are ambiguous words, because non-vocalized words may have different meanings. Therefore, these words may have more than one root. Many Arabic root extraction algorithms have been conducted to extract the roots of non-vocalized Arabic words. However, most of them return only one root and produce lower accuracy than reported when they are tested on different datasets. Arabic root extraction algorithm is an urgent need for applications like information retrieval systems, indexing, text mining, text classification, data compression, spell checking, text summarization, question answering systems and machine translation. In this work, a new rule-based Arabic root extraction algorithm is developed and focuses to overcome the limitation of previous works. The proposed algorithm is compared to the algorithm of Khoja, which is a well-known Arabic root extraction algorithm that produces high accuracy. The testing process was conducted on the corpus of Thalji, which is mainly built to test and compare Arabic roots extraction algorithms. It contains 720,000 word-root pairs from 12000 roots, 430 prefixes, 320 suffixes, and 4320 patterns. The experimental result shows that the algorithm of Khoja achieved 63%, meanwhile the proposed algorithm achieved 94% of accuracy.

Authors and Affiliations

Nisrean Thalji, Nik Adilah Hanin, Walid Bani Hani, Sohair Al-Hakeem, Zyad Thalji

Keywords

Related Articles

Norm’s Trust Model to Evaluate Norms Benefit Awareness for Norm Adoption in an Open Agent Community

In recent developments, norms have become important entities that are considered in agent-based systems’ designs. Norms are not only able to organize and coordinate the actions and behaviour of agents but have a direct i...

STUDY OF INDIAN BANKS WEBSITES FOR CYBER CRIME SAFETY MECHANSIM

The human society has undergone tremendous changes from time to time with rapid pace at social level from the beginning and technological level ever since the rise of technologies. This technology word changes the human...

Performances Comparison of IEEE 802.15.6 and IEEE 802.15.4 Optimization and Exploitation in Healthcare and Medical Applications

In this paper, we simulate the energy consumption, throughput and reliability for both, Zigbee IEEE 802.15.4 Mac protocol and BAN IEEE 802.15.6 exploited in medical applications using Guaranteed Time Slot (GTS) and polli...

Context-Aware Mobile Application Task Offloading to the Cloud

One of the benefits of mobile cloud computing is the ability to offload mobile applications to the cloud for many reasons including performance enhancement and reduced resource consumption. This paper is concerned with o...

Separability Detection Cooperative Particle Swarm Optimizer based on Covariance Matrix Adaptation

 The particle swarm optimizer (PSO) is a population-based optimization technique that can be widely utilized to many applications. The cooperative particle swarm optimization (CPSO) applies cooperative behavior to i...

Download PDF file
  • EP ID EP407454
  • DOI 10.14569/IJACSA.2018.091015
  • Views 76
  • Downloads 0

How To Cite

Nisrean Thalji, Nik Adilah Hanin, Walid Bani Hani, Sohair Al-Hakeem, Zyad Thalji (2018). A Novel Rule-Based Root Extraction Algorithm for Arabic Language. International Journal of Advanced Computer Science & Applications, 9(10), 120-128. https://europub.co.uk/articles/-A-407454