A Zone Classification Approach for Arabic Documents using Hybrid Features

Abstract

Zone segmentation and classification is an important step in document layout analysis. It decomposes a given scanned document into zones. Zones need to be classified into text and non-text, so that only text zones are provided to a recognition engine. This eliminates garbage output resulting from sending non-text zones to the engine. This paper proposes a framework for zone segmentation and classification. Zones are segmented using morphological operation and connected component analysis. Features are then extracted from each zone for the purpose of classification into text and non-text. Features are hybrid between texture-based and connected component based features. Effective features are selected using genetic algorithm. Selected features are fed into a linear SVM classifier for zone classification. System evaluation shows that the proposed zone classification works well on multi-font and multi-size documents with a variety of layouts even on historical documents.

Authors and Affiliations

Amany M. Hesham, Sherif Abdou, Amr Badr, Mohsen Rashwan, Hassanin M. Al-Barhamtoshy

Keywords

Related Articles

Comparative Analysis of Raw Images and Meta Feature based Urdu OCR using CNN and LSTM

Urdu language uses cursive script which results in connected characters constituting ligatures. For identifying characters within ligatures of different scales (font sizes), Convolution Neural Network (CNN) and Long Shor...

Research on Islanding Detection of Grid-Connected System

This paper proposed a modified detection based on the point of common coupling (PCC) voltage in the three-phrase inverter, combined over/under frequency protection, to achieve the detection of islanding states rapidly. I...

A New DTC Scheme using Second Order Sliding Mode and Fuzzy Logic of a DFIG for Wind Turbine System

This article present a novel direct torque control (DTC) scheme using high order sliding mode (HOSM) and fuzzy logic of a doubly fed induction generator (DFIG) incorporated in a wind turbine system. Conventional direct t...

Mitigation of Cascading Failures with Link Weight Control

Cascading failures are crucial issues for the study of survivability and resilience of our infrastructures and have attracted much interest in complex networks research. In this paper, we study the overload-based cascadi...

A Multiple-Objects Recognition Method Based on Region Similarity Measures: Application to Roof Extraction from Orthophotoplans

In this paper, an efficient method for automatic and accurate detection of multiple objects from images using a region similarity measure is presented. This method involves the construction of two knowledge databases: Th...

Download PDF file
  • EP ID EP138819
  • DOI 10.14569/IJACSA.2016.070722
  • Views 73
  • Downloads 0

How To Cite

Amany M. Hesham, Sherif Abdou, Amr Badr, Mohsen Rashwan, Hassanin M. Al-Barhamtoshy (2016). A Zone Classification Approach for Arabic Documents using Hybrid Features. International Journal of Advanced Computer Science & Applications, 7(7), 158-162. https://europub.co.uk/articles/-A-138819