A Word & Character N-Gram based Arabic OCR Error Simulation model

Journal Title: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY - Year 2014, Vol 12, Issue 8

Abstract

This paper provides a new model aimed to enhanceArabic OCR degraded text retrieval effectiveness. The proposed model based onsimulating the Arabic OCR recognition mistakesbased on both, word based and Character N-Gram approaches. Then we expand the user search query using the expected OCR errors. The resulting search query expanded gives high precision and recall values in searching Arabic OCR-Degraded text rather than the original query. The proposed model showed a significant increase in the degraded text retrieval effectiveness over the previous models. The retrieval effectiveness of the newmodel is %93, while the best effectiveness published for word based approach was %84 and the best effectiveness for character based approach was %56.

Authors and Affiliations

Mostafa Ezzat, Tarek Ahmed ElGhazaly, Mervat Gheith

Keywords

Related Articles

A REVIEW ON GENERAL SELF-ORGANIZED TREE-BASED ENERGY-BALANCE ROUTING PROTOCOL FOR WIRELESS SENSOR NETWORK

The quick escalations in network multimedia devices have permitted extra concurrent digital services: video conferencing, online playoffs as well as remote learning to nurture for conform e-net jobs. WSNs have become maj...

A Review of Digital Signature Using Different Elliptic Cryptography Technique

Authentication and verification of digital data is important phase in internet based transaction and data access. For the authentication and verification used digital signature operation. For the operation of digital sig...

Prediction Of Long Term Living Donor Kidney Graft Outcome: Comparison Between Different Machine Learning Methods

Predicting the outcome of a graft transplant with high level of accuracy is a challenging task In medical fields and Data Mining has a great role to answer the challenge. The goal of this study is to compare the performa...

Demand of Wireless Network and Security in Current Research

Wireless security is the prevention of unauthorized access to computers using wireless networks .The trends in wireless networks over the last few years is same as growth of internet. Wireless networks have reduced the h...

Filtering and Transformation Model for Opinion Summarization

The rapid evolution of Micro blogging sites such as Blogs & Twitter facilitate people to post real time messages about their opinions on a variety of topics inclusive of products they use in their daily life. Summari...

Download PDF file
  • EP ID EP650450
  • DOI 10.24297/ijct.v12i8.2999
  • Views 77
  • Downloads 0

How To Cite

Mostafa Ezzat, Tarek Ahmed ElGhazaly, Mervat Gheith (2014). A Word & Character N-Gram based Arabic OCR Error Simulation model. INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY, 12(8), 3758-3767. https://europub.co.uk/articles/-A-650450