Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR

Abstract

Arabic script character recognition is challenging task due to complexity of the script and huge number of ligatures. We present a method for the development of multilingual Arabic script OCR (Optical Character Recognition) and lexicon reduction for Arabic Script and its derivative languages. The objective of the proposed method is to overcome the large dataset Urdu and similar scripts by using GCT (Ghost Character Theory) concept. Arabic and its sibling script languages share the similar character dataset i.e. the character set are difference in diacritic and writing styles like Naskh or Nasta?liq. Based on the proposed method, the lexicon for Arabic and Arabic script based languages can be minimized approximately up to 20 times. The proposed multilingual Arabic script OCR approach have been evaluated for online Arabic and its derivative language like Urdu using BPNN. The result showed that proposed method helps to not only the reduction of lexicon but also helps to develop the Multilanguage character recognition system for Arabic Script.

Authors and Affiliations

Saeeda Naz, Arif Iqbal Umar, Muhammad Imran Razzak

Keywords

Related Articles

A Global Sampling Based Image Matting Using Non-Negative Matrix Factorization

Image matting is a technique in which a foreground is separated from the background of a given image along with the pixel wise opacity. This foreground can then be seamlessly composited in a different background to obtai...

Millimeter Waves Frequency Reconfigurable Antenna for 5G Networks

5G (Fifth Generation) is the next generation of data network, offering faster speeds and reliable connections on smart phones and other devices than ever before. These networks are still under development and expected to...

Blind's Eye: Employing Google Directions API for Outdoor Navigation of Visually Impaired Pedestrians

Vision plays a paramount role in our everyday life and assists human in almost every walk of life. The people lacking vision sense require assistance to move freely. The inability of unassisted navigation and orientation...

A Survey of Energy Conservation Mechanisms for Dynamic Cluster Based Wireless Sensor Networks

WSN (Wireless Sensor Network) is an emerging technology that has unlimited potential for numerous application areas including military, crisis management, environmental, transportation, medical, home/ city automations an...

Two-Dimensional Stagnation-Point Velocity-Slip Flow and Heat Transfer over Porous Stretching Sheet

Present paper investigates 2D (Two-Dimensional) stagnation-point velocity-slip flow over porous stretching sheet. The governing non-linear PDEs (Partial Differential Equations) are non-dimensionlized by using the similar...

Download PDF file
  • EP ID EP190172
  • DOI -
  • Views 118
  • Downloads 0

How To Cite

Saeeda Naz, Arif Iqbal Umar, Muhammad Imran Razzak (2016). Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR. Mehran University Research Journal of Engineering and Technology, 35(2), 209-218. https://europub.co.uk/articles/-A-190172