Text Separation from Graphics by Analyzing Stroke Width Variety in Persian City Maps
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2018, Vol 9, Issue 6
Abstract
Text segmentation is a live research field with vast new areas to be explored. Separating text layer from graphics is a fundamental step to exploit text and graphics information. The language used in the map is a challenging issue in text layer separation problem. All current methods are proposed for non-Persian language maps. In Persian, text strings are composed of one or more subwords. Each subword is also composed of one to several letters connected together. Therefore, the components of the text strings in Persian are more diverse in terms of size and geometric form than in English. Thus, the overlapping of the Persian text and the lines usually produces a complex structure that the existing methods cannot handle with the necessary efficiency. For this purpose, the stroke width variety of the input map is calculated, and then the average line width of graphics is estimated by analyzing the content of stroke width. After finding the average width of graphical lines, we classify the complex structure into text and graphics in pixel level. We evaluate our method on some variety of full crossing text and graphics in Persian maps and show that some promising results in terms of precision and recall (above 80% and 90%, respectively) are obtained.
Authors and Affiliations
Ali Ghafari- Beranghar, Ehsanollah Kabir, Kaveh Kangarloo
An Improved Grunwald-Letnikov Fractional Differential Mask for Image Texture Enhancement
Texture plays an important role in identification of objects or regions of interest in an image. In order to enhance this textural information and overcome the limitations of the classical derivative operators a tw...
One-Lead Electrocardiogram for Biometric Authentication using Time Series Analysis and Support Vector Machine
In this research, a person identification system has been simulated using electrocardiogram (ECG) signals as biometrics. Ten adult people were participated as the subjects in this research taken from their signal ECG usi...
RIN-Sum: A System for Query-Specific Multi-Document Extractive Summarization
In paper, we have proposed a novel summarization framework to generate a quality summary by extracting Relevant-Informative-Novel (RIN) sentences from topically related document collection called as RIN-Sum. In the propo...
Localisation of Information and Communication Technologies in Cameroonian Languages and Cultures:Experience and Issues
In this paper, we tackle the problem of adapting Information and Communication Technologies (ICTs) in local languages of Cameroon. The objectives are to reduce the digital and language divides, and to pave the way for th...
Detecting Public Sentiment of Medicine by Mining Twitter Data
The paper presents a computational method that mines, processes and analyzes Twitter data for detecting public sentiment of medicine. Self-reported patient data are collected over a period of three months by mining the T...