Urdu Optical Character Recognition Technique for Jameel Noori Nastaleeq Script

Journal Title: Journal of Independent Studies and Research - Computing - Year 2015, Vol 13, Issue 1

Abstract

Urdu OCR's have been an object of interest for many developers in the recent years. Active research is being done pertaining to Urdu OCR’s, but because of the complexity associated with Urdu fonts; it still lacks perfection halting it from coming up to the surface. The main objective was to create a technique that could be applied to any of the existing Urdu fonts/scripts. In this paper, the authors have developed a technique which is capable of extracting the Urdu font “Jameel Noori Nastaleeq” from images and converts it into editable textual Unicodes. The approach comprises of pre-processing techniques, label connected components, feature extraction, and image comparison. The identified objects are saved as templates which are then compared to the white pixel position length database created by the authors in order to identify the templates which are then converted into Unicode.

Authors and Affiliations

Keywords

Related Articles

Performance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases.

The performance comparison of NoSQL database and a Relational Database Management Systems has been done to identify which database responds faster to specific types of requests and suitability of these databases for diff...

Enhancing Data Quality using Human Computation and Crowd Sourcing

This paper is aimed at addressing the issues that are present in the data dumps available at DBpedia by using the concept of associations i.e. concept hierarchy to enhance the quality of those data dumps. These data dump...

Extracting patterns from Global Terrorist Dataset (GTD) Using Co-Clustering approach

Global Terrorist Dataset (GTD) is a vast collection of terrorist activities reported around the globe. The terrorism database incorporates more than 27,000 terrorism incidents from 1968 to 2014. Every record has spatial...

A Semi-supervised approach to Document Clustering with Sequence Constraints

Document clustering is usually performed as an unsupervised task. It attempts to separate different groups of documents (clusters) from a document collection based on implicitly identifying the common patterns present in...

Improving ATM User Interface (UI) of Pakistani Banks Using Keystroke Level Modelling (KLM)

The ATM connotes as Automated Teller Machine or Cash Machine. This machine has earned its currency on a larger scale in our modern society. However, unfortunately, most users have met bad experiences. For instance, reins...

Download PDF file
  • EP ID EP643245
  • DOI 10.31645/jisrc/(2015).13.1.0011
  • Views 112
  • Downloads 0

How To Cite

(2015). Urdu Optical Character Recognition Technique for Jameel Noori Nastaleeq Script. Journal of Independent Studies and Research - Computing, 13(1), 81-86. https://europub.co.uk/articles/-A-643245