Comparative Analysis of Raw Images and Meta Feature based Urdu OCR using CNN and LSTM
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2018, Vol 9, Issue 1
Abstract
Urdu language uses cursive script which results in connected characters constituting ligatures. For identifying characters within ligatures of different scales (font sizes), Convolution Neural Network (CNN) and Long Short Term Memory (LSTM) Network are used. Both network models are trained on formerly extracted ligature thickness graphs, from which models extract Meta features. These thickness graphs provide consistent information across different font sizes. LSTM and CNN are also trained on raw images to compare performance on both forms of inputs. For this research, two corpora, i.e. Urdu Printed Text Images (UPTI) and Centre for Language Engineering (CLE) Text Images are used. Overall performance of networks ranges between 90% and 99.8%. Average accuracy on Meta features is 98.08% while using raw images, 97.07% average accuracy is achieved.
Authors and Affiliations
Asma Naseer, Kashif Zafar
Visualising Arabic Sentiments and Association Rules in Financial Text
Text mining methods involve various techniques, such as text categorization, summarisation, information retrieval, document clustering, topic detection, and concept extraction. In addition, because of the difficulties in...
Analysis of Cloud Network Management Using Resource Allocation and Task Scheduling Services
Network failure in cloud datacenter could result from inefficient resource allocation; scheduling and logical segmentation of physical machines (network constraints). This is highly undesirable in Distributed Cloud Compu...
Evaluation of the Impact of Usability in Arabic University Websites: Comparison between Saudi Arabia and the UK
Today usability is a crucial factor that can affect any website. The purpose of this study is to explore major usability defects within Saudi university websites in comparison to British university websites from a Saudi...
Accuracy Based Feature Ranking Metric for Multi-Label Text Classification
In many application domains, such as machine learning, scene and video classification, data mining, medical diagnosis and machine vision, instances belong to more than one categories. Feature selection in single label te...
Towards Multi-Stage Intrusion Detection using IP Flow Records
Traditional network-based intrusion detection sys-tems using deep packet inspection are not feasible for modern high-speed networks due to slow processing and inability to read encrypted packet content. As an alternative...