Deep Learning Based Lipreading for Video Captioning

Journal Title: Engineering and Technology Journal - Year 2024, Vol 9, Issue 05

Abstract

Visual speech recognition, often referred to as lipreading, has garnered significant attention in recent years due to its potential applications in various fields such as human-computer interaction, accessibility technology, and biometric security systems. This paper explores the challenges and advancements in the field of lipreading, which involves deciphering speech from visual cues, primarily movements of the lips, tongue, and teeth. Despite being an essential aspect of human communication, lipreading presents inherent difficulties, especially in noisy environments or when contextual information is limited. The McGurk effect, where conflicting audio and visual cues lead to perceptual illusions, highlights the complexity of lipreading. Human lipreading performance varies widely, with hearing-impaired individuals achieving relatively low accuracy rates. Automating lipreading using machine learning techniques has emerged as a promising solution, with potential applications ranging from silent dictation in public spaces to biometric authentication systems. Visual speech recognition methods can be broadly categorized into those that focus on mimicking words and those that model visemes, visually distinguishable phonemes. While word-based approaches are suitable for isolated word recognition, viseme-based techniques are better suited for continuous speech recognition tasks. This study proposes a novel deep learning architecture for lipreading, leveraging Conv3D layers for spatiotemporal feature extraction and bidirectional LSTM layers for sequence modelling. The proposed model demonstrates significant improvements in lipreading accuracy, outperforming traditional methods on benchmark datasets. The practical implications of automated lipreading extend beyond accessibility technology to include biometric identity verification, security surveillance, and enhanced communication aids for individuals with hearing impairments. This paper provides insights into the advancements, challenges, and future directions of visual speech recognition research, paving the way for innovative applications in diverse domains.

Authors and Affiliations

Sankalp Kala, Prof. Sridhar Ranganathan,

Keywords

Related Articles

Harmony in Nodes: Exploring Efficiency and Resilience in Distributed Systems

This research, titled "Harmony in Nodes: Exploring Efficiency and Resilience in Distributed Systems," delves into the intricate dynamics of distributed systems, aiming to uncover the delicate balance required for optimal...

A NEEDS ASSESSMENT TO EXPLORE THE FEASIBILITY OF A UNIFIED EVALUATION FRAMEWORK FOR DIVERSE JOB SPECIALIZATIONS IN THE JOII WORKSTREAM

This study explores the possibility and potential benefits of creating a standardized evaluation framework for various job specializations within Joii Workstream. Joii Workstream is a business process outsourcing (BPO) c...

Access and Management of Electronic Information Resources in Umaru Musa Yar’adua University Library, Katsina State Nigeria

The study investigated the access and management of electronic information resources in Umaru Musa Yar’adua University Katsina, library. The study adopted quantitative approach as research paradigm; with survey as resear...

AI-Powered Fraud Detection in Auditing Using Machine Learning and Deep Learning Techniques

Financial fraud poses threats to the transparency and integrity of financial systems and therefore requires more advanced detection methods in auditing. This study proposes the application of artificial intelligence, i.e...

DATA PROCESSING PROCEDURE FOR DSRC PROBE-BASED ADVANCED TRAVELER INFORMATION SYSTEM ON SIGNALIZED ARTERIALS

When faced with traffic congestion on the road, drivers are eager to avoid it by diverting to a less congested route using real-life traffic information. To meet the demand from the public, advanced traveler information...

Download PDF file
  • EP ID EP735130
  • DOI 10.47191/etj/v9i05.08
  • Views 65
  • Downloads 0

How To Cite

Sankalp Kala, Prof. Sridhar Ranganathan, (2024). Deep Learning Based Lipreading for Video Captioning. Engineering and Technology Journal, 9(05), -. https://europub.co.uk/articles/-A-735130