QTID: Quran Text Image Dataset
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2018, Vol 9, Issue 3
Abstract
Improving the accuracy of Arabic text recognition in imagery requires a big modern dataset as data is the fuel for many modern machine learning models. This paper proposes a new dataset, called QTID, for Quran Text Image Dataset, the first Arabic dataset that includes Arabic marks. It consists of 309,720 different 192x64 annotated Arabic word images that contain 2,494,428 characters in total, which were taken from the Holy Quran. These finely annotated images were randomly divided into 90%, 5%, 5% sets for training, validation, and testing, respectively. In order to analyze QTID, a different dataset statistics were shown. Experimental evaluation shows that current best Arabic text recognition engines like Tesseract and ABBYY FineReader cannot work well with word images from the proposed dataset.
Authors and Affiliations
Mahmoud Badry, Hesham H M Hassan, Hanaa Bayomi, Hussien Oakasha
PEDAGOGY: INSTRUCTIVISM TO SOCIO-CONSTRUCTIVISM THROUGH VIRTUAL REALITY
Learning theories evolved with time, beginning with instructivism, constructivism, to social constructivism. These theories no doubt were applied in education and they had their effects on learners. Technology advanced,...
An Efficient Algorithm to Automated Discovery of Interesting Positive and Negative Association Rules
Association Rule mining is very efficient technique for finding strong relation between correlated data. The correlation of data gives meaning full extraction process. For the discovering frequent items and the mining of...
Robust R Peak and QRS detection in Electrocardiogram using Wavelet Transform
In this paper a robust R Peak and QRS detection using Wavelet Transform has been developed. Wavelet Transform provides efficient localization in both time and frequency. Discrete Wavelet Transform (DWT) has been used to...
Dynamic Reconfiguration of LPWANs Pervasive System using Multi-agent Approach
The development of the Low Power Wide Area Network (LPWAN) has given new hope for the Internet of Things and M2M networks to become the most prevalent network type in industrial world in the near future. This type of net...
Audio Watermarking with Error Correction
In recent times, communication through the internet has tremendously facilitated the distribution of multimedia data. Although this is indubitably a boon, one of its repercussions is that it has also given impetus to the...