Translating Images into Text Descriptions and Speech Synthesis for Learning Purpose

Abstract

Image to text and speech conversion system can be useful for improving accessibility of images for visually impaired as well as physically challenging people understand the scenario from the images and also train the system as that of human brain. The techniques of image segmentation and edge detection play an important role in implementing proposed system. The system generates text descriptions for an input image given by the user. Object wise generation of sentences, preposition and conjunction mapping is a challenging task. The framework formulates the interaction between image segmentation and object recognition in the framework of Canny algorithm. The system goes through various phases such as pre-processing, feature extraction, object recognition, edge detection, image segmentation and Text To Speech (TTS) conversion. The proposed system database consists of huge set of sample images, which help to perform training of database. The accuracy of proposed system is achieved due to the proper recognition of objects and sentences are formed making use of the recognized objects. These sample images consists of several categories of images. The system mainly consists of two main modules such as image to text and text to speech. An image to text module generates text descriptions in natural language based on understanding of image. A text to speech module generates speech synthesis in English from description of natural language.

Authors and Affiliations

Yogesh N. Shinde, Mrunmayee Patil

Keywords

Related Articles

DC Electrical Properties of Antimony Substituted Lithium Ferrites

This paper discusses the DC electrical properties of antimony substituted lithium ferrites with compositional formula [Li0.5+x SbxFe2.5-2x O4]; where x=0.0 to 1.0 insteps of 0.1 prepared by conventional standard ceramic...

Implementation of Full Adder using Cmos Logic

this paper gives an insight into the use of Complementary Metal Oxide Semiconductor (CMOS) logic which can be made use of to implement various circuits, both combinational and sequential. In this paper, full adder, havi...

Performance Analysis and Coding of Blind Adaptive Fractional – Space Constant Modulus Algorithm

in bandwidth-efficient digital transmission, adaptive algorithm is widely used in the digital signal processing like channel estimation, channel equalization, echo cancellation, high definition television (hdtv) set-top...

Finite Element Analysis of Front Axle of Farm Tractor Using CAE Tools

Front Axle is attached to the front side of the Tractor and is used in the process of steering the machine towards right or left and is one of the major and very important components. Designing of the components is very...

Matlab Implementation of Face Recognition Using Local Binary Variance Pattern

Face images can be seen as a composition of micro-patterns which can be well described by LBP (Local Binary Pattern). We exploited this observation on human face database for efficient representation in face recognition...

Download PDF file
  • EP ID EP22315
  • DOI -
  • Views 176
  • Downloads 3

How To Cite

Yogesh N. Shinde, Mrunmayee Patil (2016). Translating Images into Text Descriptions and Speech Synthesis for Learning Purpose. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 4(6), -. https://europub.co.uk/articles/-A-22315