A Word & Character N-Gram based Arabic OCR Error Simulation model

Journal Title: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY - Year 2014, Vol 12, Issue 8

Abstract

This paper provides a new model aimed to enhanceArabic OCR degraded text retrieval effectiveness. The proposed model based onsimulating the Arabic OCR recognition mistakesbased on both, word based and Character N-Gram approaches. Then we expand the user search query using the expected OCR errors. The resulting search query expanded gives high precision and recall values in searching Arabic OCR-Degraded text rather than the original query. The proposed model showed a significant increase in the degraded text retrieval effectiveness over the previous models. The retrieval effectiveness of the newmodel is %93, while the best effectiveness published for word based approach was %84 and the best effectiveness for character based approach was %56.

Authors and Affiliations

Mostafa Ezzat, Tarek Ahmed ElGhazaly, Mervat Gheith

Keywords

Related Articles

SHORT TERM WIND SPEED PREDICTION USING A NEW HYBRID MODEL WITH PASSIVE CONGREGATION

Short term wind speed predicting is essential in using wind energy as an alternative source of electrical power generation, thus the improvement of wind speed prediction accuracy becomes an important issue. Although many...

A REVIEW ON MULTISCALE TEXTURE FEATURES USING STEERABLE PYRAMIDS

As a result of recent advancements in digital storage technology, it is now possible to create large and extensive databases of digital imagery. These collections may contain millions of images and terabytes of data. For...

Simulating Efficient power Wireless Sensor Network over Smart University Campus

Attendance is one of the important factors that determine the students activity in any educational organizations. Taking attendance manually is considered as a huge task, even if, it was done using traditional methods su...

Balanced Scorecard Model for Hazards Risk Management at Limpopo River Basin A Country Participatory Approach for MCDA with Scenario Planning

This paper focuses on the application of both Balanced Scorecard (BSC) conceptual framework and Multi-criteria Decision Analysis (MCDA) a tool for Scenario Planning as a tool for Strategic Decision Thinking, on hazard ri...

A Study on Surface Roughness and Cutting Width for Circular Contour Machining of Stir Cast AA6063/SiC Composites in WEDM

Wire electrical discharge machining is used in machining electric conductive materials with intricate shapes and profiles. This paper presents an experimental investigation on the influence of cutting conditions of WEDM...

Download PDF file
  • EP ID EP650450
  • DOI 10.24297/ijct.v12i8.2999
  • Views 85
  • Downloads 0

How To Cite

Mostafa Ezzat, Tarek Ahmed ElGhazaly, Mervat Gheith (2014). A Word & Character N-Gram based Arabic OCR Error Simulation model. INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY, 12(8), 3758-3767. https://europub.co.uk/articles/-A-650450