Text Separation from Graphics by Analyzing Stroke Width Variety in Persian City Maps

Abstract

Text segmentation is a live research field with vast new areas to be explored. Separating text layer from graphics is a fundamental step to exploit text and graphics information. The language used in the map is a challenging issue in text layer separation problem. All current methods are proposed for non-Persian language maps. In Persian, text strings are composed of one or more subwords. Each subword is also composed of one to several letters connected together. Therefore, the components of the text strings in Persian are more diverse in terms of size and geometric form than in English. Thus, the overlapping of the Persian text and the lines usually produces a complex structure that the existing methods cannot handle with the necessary efficiency. For this purpose, the stroke width variety of the input map is calculated, and then the average line width of graphics is estimated by analyzing the content of stroke width. After finding the average width of graphical lines, we classify the complex structure into text and graphics in pixel level. We evaluate our method on some variety of full crossing text and graphics in Persian maps and show that some promising results in terms of precision and recall (above 80% and 90%, respectively) are obtained.

Authors and Affiliations

Ali Ghafari- Beranghar, Ehsanollah Kabir, Kaveh Kangarloo

Keywords

Related Articles

Interest Reduction and PIT Minimization in Content Centric Networks

Content Centric Networking aspires to a more efficient use of the Internet through in-path caching, multi-homing, and provisions for state maintenance and intelligent forwarding at the CCN routers. However, these benefit...

Content based Document Classification using Soft Cosine Measure

Document classification is a deep-rooted issue in information retrieval and assumed to be an imperative part of an assortment of applications for effective management of text documents and substantial volumes of unstruct...

An Intelligent Security Approach using Game Theory to Detect DoS Attacks In IoT

The Internet of Things (IoT) is a new concept in the world of Information and Communication Technology (ICT). The structure of this global network is highly interconnected and presents a new category of challenges from t...

Sperm Motility Algorithm for Solving Fractional Programming Problems under Uncertainty

This paper investigated solving Fractional Programming Problems under Uncertainty (FPPU) using Sperm Motility Algorithm. Sperm Motility Algorithm (SMA) is a novel metaheuristic algorithm inspired by fertilization process...

Development and Role of Electronic Library in Information Technology Teaching in Bulgarian Schools*

The electronic library can be considered as an interactive information space. Its creation substantially supports the communication between the teachers and the student, as well as between the teachers and the parents. T...

Download PDF file
  • EP ID EP322108
  • DOI 10.14569/IJACSA.2018.090632
  • Views 102
  • Downloads 0

How To Cite

Ali Ghafari- Beranghar, Ehsanollah Kabir, Kaveh Kangarloo (2018). Text Separation from Graphics by Analyzing Stroke Width Variety in Persian City Maps. International Journal of Advanced Computer Science & Applications, 9(6), 222-229. https://europub.co.uk/articles/-A-322108