Comparative Analysis of Raw Images and Meta Feature based Urdu OCR using CNN and LSTM

Abstract

Urdu language uses cursive script which results in connected characters constituting ligatures. For identifying characters within ligatures of different scales (font sizes), Convolution Neural Network (CNN) and Long Short Term Memory (LSTM) Network are used. Both network models are trained on formerly extracted ligature thickness graphs, from which models extract Meta features. These thickness graphs provide consistent information across different font sizes. LSTM and CNN are also trained on raw images to compare performance on both forms of inputs. For this research, two corpora, i.e. Urdu Printed Text Images (UPTI) and Centre for Language Engineering (CLE) Text Images are used. Overall performance of networks ranges between 90% and 99.8%. Average accuracy on Meta features is 98.08% while using raw images, 97.07% average accuracy is achieved.

Authors and Affiliations

Asma Naseer, Kashif Zafar

Keywords

Related Articles

Visualising Arabic Sentiments and Association Rules in Financial Text

Text mining methods involve various techniques, such as text categorization, summarisation, information retrieval, document clustering, topic detection, and concept extraction. In addition, because of the difficulties in...

Analysis of Cloud Network Management Using Resource Allocation and Task Scheduling Services

Network failure in cloud datacenter could result from inefficient resource allocation; scheduling and logical segmentation of physical machines (network constraints). This is highly undesirable in Distributed Cloud Compu...

Evaluation of the Impact of Usability in Arabic University Websites: Comparison between Saudi Arabia and the UK

Today usability is a crucial factor that can affect any website. The purpose of this study is to explore major usability defects within Saudi university websites in comparison to British university websites from a Saudi...

Accuracy Based Feature Ranking Metric for Multi-Label Text Classification

In many application domains, such as machine learning, scene and video classification, data mining, medical diagnosis and machine vision, instances belong to more than one categories. Feature selection in single label te...

Towards Multi-Stage Intrusion Detection using IP Flow Records

Traditional network-based intrusion detection sys-tems using deep packet inspection are not feasible for modern high-speed networks due to slow processing and inability to read encrypted packet content. As an alternative...

Download PDF file
  • EP ID EP261663
  • DOI 10.14569/IJACSA.2018.090157
  • Views 191
  • Downloads 0

How To Cite

Asma Naseer, Kashif Zafar (2018). Comparative Analysis of Raw Images and Meta Feature based Urdu OCR using CNN and LSTM. International Journal of Advanced Computer Science & Applications, 9(1), 419-424. https://europub.co.uk/articles/-A-261663