QTID: Quran Text Image Dataset

Abstract

Improving the accuracy of Arabic text recognition in imagery requires a big modern dataset as data is the fuel for many modern machine learning models. This paper proposes a new dataset, called QTID, for Quran Text Image Dataset, the first Arabic dataset that includes Arabic marks. It consists of 309,720 different 192x64 annotated Arabic word images that contain 2,494,428 characters in total, which were taken from the Holy Quran. These finely annotated images were randomly divided into 90%, 5%, 5% sets for training, validation, and testing, respectively. In order to analyze QTID, a different dataset statistics were shown. Experimental evaluation shows that current best Arabic text recognition engines like Tesseract and ABBYY FineReader cannot work well with word images from the proposed dataset.

Authors and Affiliations

Mahmoud Badry, Hesham H M Hassan, Hanaa Bayomi, Hussien Oakasha

Keywords

Related Articles

PEDAGOGY: INSTRUCTIVISM TO SOCIO-CONSTRUCTIVISM THROUGH VIRTUAL REALITY

Learning theories evolved with time, beginning with instructivism, constructivism, to social constructivism. These theories no doubt were applied in education and they had their effects on learners. Technology advanced,...

An Efficient Algorithm to Automated Discovery of Interesting Positive and Negative Association Rules

Association Rule mining is very efficient technique for finding strong relation between correlated data. The correlation of data gives meaning full extraction process. For the discovering frequent items and the mining of...

Robust R Peak and QRS detection in Electrocardiogram using Wavelet Transform

In this paper a robust R Peak and QRS detection using Wavelet Transform has been developed. Wavelet Transform provides efficient localization in both time and frequency. Discrete Wavelet Transform (DWT) has been used to...

Dynamic Reconfiguration of LPWANs Pervasive System using Multi-agent Approach

The development of the Low Power Wide Area Network (LPWAN) has given new hope for the Internet of Things and M2M networks to become the most prevalent network type in industrial world in the near future. This type of net...

Audio Watermarking with Error Correction 

In recent times, communication through the internet has tremendously facilitated the distribution of multimedia data. Although this is indubitably a boon, one of its repercussions is that it has also given impetus to the...

Download PDF file
  • EP ID EP278332
  • DOI 10.14569/IJACSA.2018.090351
  • Views 95
  • Downloads 0

How To Cite

Mahmoud Badry, Hesham H M Hassan, Hanaa Bayomi, Hussien Oakasha (2018). QTID: Quran Text Image Dataset. International Journal of Advanced Computer Science & Applications, 9(3), 385-391. https://europub.co.uk/articles/-A-278332