Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR

Abstract

Arabic script character recognition is challenging task due to complexity of the script and huge number of ligatures. We present a method for the development of multilingual Arabic script OCR (Optical Character Recognition) and lexicon reduction for Arabic Script and its derivative languages. The objective of the proposed method is to overcome the large dataset Urdu and similar scripts by using GCT (Ghost Character Theory) concept. Arabic and its sibling script languages share the similar character dataset i.e. the character set are difference in diacritic and writing styles like Naskh or Nasta?liq. Based on the proposed method, the lexicon for Arabic and Arabic script based languages can be minimized approximately up to 20 times. The proposed multilingual Arabic script OCR approach have been evaluated for online Arabic and its derivative language like Urdu using BPNN. The result showed that proposed method helps to not only the reduction of lexicon but also helps to develop the Multilanguage character recognition system for Arabic Script.

Authors and Affiliations

Saeeda Naz, Arif Iqbal Umar, Muhammad Imran Razzak

Keywords

Related Articles

Effect of Bridge Pier Shape on Scour Depth at Uniform Single Bridge Pier

Bridge pier scouring may lead to the bridge failure and the shape of bridge pier itself is one of the main factor to control scouring around bridge pier. The amount of sediment which is removed from the boundary of bridg...

Optimization of Sono-Electrocoagulation Process for the Removal of Dye Using Central Composite Design

Sono-electrocaogulation process was successfully applied for the removal of RR120 (Reactive Red 120) in the presence of activated carbon. For this purpose, the process variables were optimized using CCD (Central Composit...

Benefits of Incorporating Induction Furnace Slag in Concrete as Replacement of Cement: A Case Study of Pakistan

As Pakistan along with the rest of the world continues to develop, demand for limited natural resources continues to increase also. This demand for resources and subsequent waste that is generated has driven the idea of...

Development of Entrepreneurial and Marketing Capabilities in Engineering & Technology Based Firms

EM (Entrepreneurial Making) and MC (Marketing Capabilities) play a crucial role in the success of a firm. Many engineering and technology firms are run by people having an engineering degree which have less exposure to m...

Effect of Organizational Structures and Types of Construction on Perceptions of Factors Contributing to Project Failure in Pakistan

The construction industry is viewed as the regulator of national economy globally. Its importance in Pakistan has increased greatly because of the involvement of international funding agencies in infrastructure projects....

Download PDF file
  • EP ID EP190172
  • DOI -
  • Views 123
  • Downloads 0

How To Cite

Saeeda Naz, Arif Iqbal Umar, Muhammad Imran Razzak (2016). Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR. Mehran University Research Journal of Engineering and Technology, 35(2), 209-218. https://europub.co.uk/articles/-A-190172