Extracting Text from Image Document and Displaying Its Related Information

Abstract

Image Text is the text information embedded or written in image of different form. Image text can be found in captured images, scanned documents, magazines, newspapers, posters etc. These image texts are highly available nowadays and they are very important in representing, describing and transferring information which help peoples in communication, solving problems, availability, creation of new types of jobs, cost effectiveness, productivity, globalization and cultural gap etc. The information from these image documents would give higher efficiency and ease of access if it is converted to text form. The process by which Image Text converted into plain text is Text Extraction. Text Extraction is useful in information retrieving, searching, editing, documenting, archiving or reporting of image text. However, variation of these texts due to differences in size, orientation style, and alignment, text is embedded in complex colored document images, degraded documents image, low quality image, as well as low image contrast and complex background make problem text extraction extremely difficult and challenging one. Different techniques such as Connected Component Method, Mathematical Morphology Method, Edged Based Method and Texture Based Method have been used previously, but those all have their own limitations when measured by different parameters like precision, recall and fscore. In this paper, text extraction from image documents, using combination of the two powerful methods Connected Component and Edge Based Method, in order to enhance performance and accuracy of text extraction is discussed and implementation is done by integrated MATLAB code with MATLAB/Simulink tool and the proposed system is tested by Digital Image Binarization Competition (DIBCO) 2017 dataset. Finally, the extracted and recognized is converted to speech for proper use for visually impaired people.

Authors and Affiliations

K. N. Natei, J. Viradiya, S. Sasi kumar

Keywords

Related Articles

Omnidirectional Band Gaps in Heterostructure Materials Composed of Meta-materials and Magnetic Materials

Enlarged band gap in photonic heterostructure (PC1/PC2) composed of metamaterials and magnetic materials is studied using simple transfer matrix method. As we know that the meta-materials have the unusual electromagnetic...

An Image Compression Algorithm and Its Analysis on the Basis of Different Masking Tables

Image compression algorithms are used to reduce the redundancies in the representation of the data so that the data storage requirements and the communication cost can be reduced. Different algorithms have been developed...

Establishing databases based on computer experiments on key characteristics of continuous gas lift wells

Program becomes matured via Design method. Utilized concepts of this stage are BDE, Data Access and Data controls Data Access plays a prominent role through all this process, moreover, it consists Data Source as well as...

Stability Fault analysis in Sub-threshold SRAM using Wavelet transform

Designing and testing robust SRAM memory for sub threshold systems is extremely challenging because of the reduced voltage margin and are highly sensitive to physical defects. Because of the unique architecture of sub-th...

Experimental studies on fiber reinforced concrete

The concepts of using fibres in order to reinforce matrices weak in tension is more than 4500 years old.since Portland cement concrete started to be used widely as a construction material attempts were made to use fibres...

Download PDF file
  • EP ID EP394275
  • DOI 10.9790/9622-0805052733.
  • Views 87
  • Downloads 0

How To Cite

K. N. Natei, J. Viradiya, S. Sasi kumar (2018). Extracting Text from Image Document and Displaying Its Related Information. International Journal of engineering Research and Applications, 8(5), 27-33. https://europub.co.uk/articles/-A-394275