Fine-Grained Quran Dataset

Abstract

Extracting knowledge from text documents has become one of the main hot topics in the field of Natural Language Processing (NLP) in the era of information explosion. Arabic NLP is considered immature due to several reasons including the low available resources. On the other hand, automatically extracting reliable knowledge from specialized data sources as holy books is considered ultimately a challenging task but of great benefit to all humans. In this context, this paper provides a comprehensive Quranic Dataset as a first part (foundation) of an ongoing research that attempts to lay grounds for approaches and applications to explore the holy Quran. The paper presents the algorithms and approaches that have been designed to extract an aggregative data from massive Arabic text sources including the holy Quran and tightly associated books. Holy Quran text is transferred into structured multi-dimensional data records starting from the chapter level, the word level and then the character level. All these are linked with interpretations and meanings, parsing, translations, intonation roots and stems of words, all from authentic and reliable sources. The final dataset is represented in excel sheets and database records format. Also, the paper presents models of the dataset at all levels. The Quranic dataset presented in this paper was designed to be appropriate for: database, data mining, text mining and Artificial Intelligence applications; it is also designed to serve as a comprehensive encyclopedia of holy Quran and the Quranic Science books.

Authors and Affiliations

Mohamed Hegazi, Anwer Hilal, Mohammad Alhawarat

Keywords

Related Articles

Accident Detection and Smart Rescue System using Android Smartphone with Real-Time Location Tracking

A large number of deaths are caused by Traffic accidents worldwide. The global crisis of road safety can be seen by observing the significant number of deaths and injuries that are caused by road traffic accidents. In ma...

Effects of Modulation Index on Harmonics of SP-PWM Inverter Supplying Universal Motor

This manuscript presents the effects of changing modulation indices on current and voltage harmonics of universal motor when it is supplied by single phase PWM (SP-PWM) inverter, the effect has been analyzed with simulat...

  A Feasible Rural Education System

 The education system in rural and semi-rural areas of developing and underdeveloped countries are facing many challenges. The limited accessibility and challenges to the education are attributed mainly to political...

Firefly Algorithm for Adaptive Emergency Evacuation Center Management

Flood disaster is among the most devastating natural disasters in the world, claiming more lives and causing property damage. The pattern of floods across all continents has been changing, becoming more frequent, intense...

Efficient Iris Pattern Recognition Method by using Adaptive Hamming Distance and 1D Log-Gabor Filter

Iris recognition is one of the highly reliable security methods as compared to the other bio-metric security techniques. The iris is an internal organ whose texture is randomly de-termined during embryonic gestation and...

Download PDF file
  • EP ID EP106690
  • DOI 10.14569/IJACSA.2015.061241
  • Views 116
  • Downloads 0

How To Cite

Mohamed Hegazi, Anwer Hilal, Mohammad Alhawarat (2015). Fine-Grained Quran Dataset. International Journal of Advanced Computer Science & Applications, 6(12), 308-313. https://europub.co.uk/articles/-A-106690