A Zone Classification Approach for Arabic Documents using Hybrid Features

Abstract

Zone segmentation and classification is an important step in document layout analysis. It decomposes a given scanned document into zones. Zones need to be classified into text and non-text, so that only text zones are provided to a recognition engine. This eliminates garbage output resulting from sending non-text zones to the engine. This paper proposes a framework for zone segmentation and classification. Zones are segmented using morphological operation and connected component analysis. Features are then extracted from each zone for the purpose of classification into text and non-text. Features are hybrid between texture-based and connected component based features. Effective features are selected using genetic algorithm. Selected features are fed into a linear SVM classifier for zone classification. System evaluation shows that the proposed zone classification works well on multi-font and multi-size documents with a variety of layouts even on historical documents.

Authors and Affiliations

Amany M. Hesham, Sherif Abdou, Amr Badr, Mohsen Rashwan, Hassanin M. Al-Barhamtoshy

Keywords

Related Articles

Vietnamese Speech Command Recognition using Recurrent Neural Networks

Voice control is an important function in many mobile devices, in a smart home, especially in providing people with disabilities a convenient way to communicate with the device. Despite many studies on this problem in th...

Contemplation of Effective Security Measures in Access Management from Adoptability Perspective

With the extension in computer networks, there has been a drastic change in the disposition of network security. Security has always been a major concern of any organization as it involves mechanisms to ensure reliable a...

Evaluating Web Accessibility Metrics for Jordanian Universities

University web portals are considered one of the main access gateways for universities. Typically, they have a large candidate audience among the current students, employees, and faculty members aside from previous and f...

The cybercrime process : an overview of scientific challenges and methods

The aim of this article is to describe the cybercrime process and to identify all issues that appear at the different steps, between the detection of incident to the final report that must be exploitable for a judge. It...

High Accuracy Arabic Handwritten Characters Recognition Using Error Back Propagation Artificial Neural Networks

This manuscript considers a new architecture to handwritten characters recognition based on simulation of the behavior of one type of artificial neural network, called the Error Back Propagation Artificial Neural Network...

Download PDF file
  • EP ID EP138819
  • DOI 10.14569/IJACSA.2016.070722
  • Views 92
  • Downloads 0

How To Cite

Amany M. Hesham, Sherif Abdou, Amr Badr, Mohsen Rashwan, Hassanin M. Al-Barhamtoshy (2016). A Zone Classification Approach for Arabic Documents using Hybrid Features. International Journal of Advanced Computer Science & Applications, 7(7), 158-162. https://europub.co.uk/articles/-A-138819