A Zone Classification Approach for Arabic Documents using Hybrid Features

Abstract

Zone segmentation and classification is an important step in document layout analysis. It decomposes a given scanned document into zones. Zones need to be classified into text and non-text, so that only text zones are provided to a recognition engine. This eliminates garbage output resulting from sending non-text zones to the engine. This paper proposes a framework for zone segmentation and classification. Zones are segmented using morphological operation and connected component analysis. Features are then extracted from each zone for the purpose of classification into text and non-text. Features are hybrid between texture-based and connected component based features. Effective features are selected using genetic algorithm. Selected features are fed into a linear SVM classifier for zone classification. System evaluation shows that the proposed zone classification works well on multi-font and multi-size documents with a variety of layouts even on historical documents.

Authors and Affiliations

Amany M. Hesham, Sherif Abdou, Amr Badr, Mohsen Rashwan, Hassanin M. Al-Barhamtoshy

Keywords

Related Articles

New Approach based on Machine Learning for Short-Term Mortality Prediction in Neonatal Intensive Care Unit

Mortality remains one of the most important outcomes to predict in Intensive Care Units (ICUs). In fact, the sooner mortality is predicted, the better critical decisions are made by doctors based on patient’s illness sev...

Theoretical and numerical characterization of continuously graded thin layer by the reflection acoustic microscope

This article presents a theoretical and numerical study by the reflection acoustic microscope of the surface acoustic waves propagation at the interface formed by a thin layer and the coupling liquid (water). The thin la...

Collaborative Integrated Model in Agile Software Development (MDSIC/MDSIC–M)-Case Study and Practical Advice

The fast increase of mobile device users based on wider and easier internet access has detonated the development of mobile applications (APP) and web. Therefore, improvement and innovation have become a top priority for...

A Novel Approach for Dimensionality Reduction and Classification of Hyperspectral Images based on Normalized Synergy

During the last decade, hyperspectral images have attracted increasing interest from researchers worldwide. They provide more detailed information about an observed area and allow an accurate target detection and precise...

Mobility based Net Ordering for Simultaneous Escape Routing

With the advancement in electronics technology, number of pins under the ball grid array (BGA) are increasing on reduced size components. In small size components, a challenging task is to solve the escape routing proble...

Download PDF file
  • EP ID EP138819
  • DOI 10.14569/IJACSA.2016.070722
  • Views 108
  • Downloads 0

How To Cite

Amany M. Hesham, Sherif Abdou, Amr Badr, Mohsen Rashwan, Hassanin M. Al-Barhamtoshy (2016). A Zone Classification Approach for Arabic Documents using Hybrid Features. International Journal of Advanced Computer Science & Applications, 7(7), 158-162. https://europub.co.uk/articles/-A-138819