Translating Images into Text Descriptions and Speech Synthesis for Learning Purpose

Abstract

Image to text and speech conversion system can be useful for improving accessibility of images for visually impaired as well as physically challenging people understand the scenario from the images and also train the system as that of human brain. The techniques of image segmentation and edge detection play an important role in implementing proposed system. The system generates text descriptions for an input image given by the user. Object wise generation of sentences, preposition and conjunction mapping is a challenging task. The framework formulates the interaction between image segmentation and object recognition in the framework of Canny algorithm. The system goes through various phases such as pre-processing, feature extraction, object recognition, edge detection, image segmentation and Text To Speech (TTS) conversion. The proposed system database consists of huge set of sample images, which help to perform training of database. The accuracy of proposed system is achieved due to the proper recognition of objects and sentences are formed making use of the recognized objects. These sample images consists of several categories of images. The system mainly consists of two main modules such as image to text and text to speech. An image to text module generates text descriptions in natural language based on understanding of image. A text to speech module generates speech synthesis in English from description of natural language.

Authors and Affiliations

Yogesh N. Shinde, Mrunmayee Patil

Keywords

Related Articles

slugAn Application in Railway Derailment by Measuring Eye Blinking and Vibration Sensing for Safe Driving

Railways are large infrastructures and are the prime modes of transportation in many countries. As it is closely associated with passenger and cargo transportation, it owns high risk in terms of h...

Ozone detection by DFB QCL absorption technique using Multipass cell

An ultra-sensitive and selective Direct Absorption based Distributed Feed Back Quantum Cascade Laser (DFB-QCL) sensor platform was demonstrated for detection of ozone traces. This sensor system used a wavelength tuned,...

Synthesis for a Face Video of Target Subject

synthesizing a face video of a target subject that nothing but the mimicry of the expressions of a source subject in the input video is facial expression retargeting in video. Facial expression retargeting has applicati...

Design and Fabrication of Reverse Gear Mechanism for Handicapped People

In the present scenario, there were no mopped vehicles equipped with reverse gear facility. So it is very difficult for a handicapped person while the vehicles front wheel gets into a trench as well as in the case of pa...

Literature Review on Scientometric in Cloud Computing

The prominence and fast advancement of Cloudcomputing lately has prompted a gigantic measure of productions containing the accomplished learning of this zone of examination. Because of the interdisciplinary nature and h...

Download PDF file
  • EP ID EP22315
  • DOI -
  • Views 179
  • Downloads 3

How To Cite

Yogesh N. Shinde, Mrunmayee Patil (2016). Translating Images into Text Descriptions and Speech Synthesis for Learning Purpose. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 4(6), -. https://europub.co.uk/articles/-A-22315