A Novel Rule-Based Root Extraction Algorithm for Arabic Language

Abstract

Non-vocalized Arabic words are ambiguous words, because non-vocalized words may have different meanings. Therefore, these words may have more than one root. Many Arabic root extraction algorithms have been conducted to extract the roots of non-vocalized Arabic words. However, most of them return only one root and produce lower accuracy than reported when they are tested on different datasets. Arabic root extraction algorithm is an urgent need for applications like information retrieval systems, indexing, text mining, text classification, data compression, spell checking, text summarization, question answering systems and machine translation. In this work, a new rule-based Arabic root extraction algorithm is developed and focuses to overcome the limitation of previous works. The proposed algorithm is compared to the algorithm of Khoja, which is a well-known Arabic root extraction algorithm that produces high accuracy. The testing process was conducted on the corpus of Thalji, which is mainly built to test and compare Arabic roots extraction algorithms. It contains 720,000 word-root pairs from 12000 roots, 430 prefixes, 320 suffixes, and 4320 patterns. The experimental result shows that the algorithm of Khoja achieved 63%, meanwhile the proposed algorithm achieved 94% of accuracy.

Authors and Affiliations

Nisrean Thalji, Nik Adilah Hanin, Walid Bani Hani, Sohair Al-Hakeem, Zyad Thalji

Keywords

Related Articles

Image Segmentation Via Color Clustering

This paper develops a computationally efficient process for segmentation of color images. The input image is partitioned into a set of output images in accordance to color characteristics of various image regions. The al...

A Hybrid Method to Improve Forecasting Accuracy Utilizing Genetic Algorithm –An Application to the Data of Operating equipment and supplies

In industries, how to improve forecasting accuracy such as sales, shipping is an important issue. There are many researches made on this. In this paper, a hybrid method is introduced and plural methods are compared. Focu...

Multi-Stage Algorithms for Solving a Generalized Capacitated P-median Location Problem

The capacitated p-median location problem is one of the famous problems widely discussed in the literature, but its generalization to a multi-capacity case has not. This generalization, called multi-capacitated location...

 Web Anomaly Misuse Intrusion Detection Framework for SQL Injection Detection

 Databases at the background of e-commerce applications are vulnerable to SQL injection attack which is considered as one of the most dangerous web attacks. In this paper we propose a framework based on misuse and a...

 A New Automatic Method to Adjust Parameters for Object Recognition

 To recognize an object in an image, the user must apply a combination of operators, where each operator has a set of parameters. These parameters must be “well” adjusted in order to reach good results. Usually, thi...

Download PDF file
  • EP ID EP407454
  • DOI 10.14569/IJACSA.2018.091015
  • Views 84
  • Downloads 0

How To Cite

Nisrean Thalji, Nik Adilah Hanin, Walid Bani Hani, Sohair Al-Hakeem, Zyad Thalji (2018). A Novel Rule-Based Root Extraction Algorithm for Arabic Language. International Journal of Advanced Computer Science & Applications, 9(10), 120-128. https://europub.co.uk/articles/-A-407454