Deep Learning Based Lipreading for Video Captioning
Journal Title: Engineering and Technology Journal - Year 2024, Vol 9, Issue 05
Abstract
Visual speech recognition, often referred to as lipreading, has garnered significant attention in recent years due to its potential applications in various fields such as human-computer interaction, accessibility technology, and biometric security systems. This paper explores the challenges and advancements in the field of lipreading, which involves deciphering speech from visual cues, primarily movements of the lips, tongue, and teeth. Despite being an essential aspect of human communication, lipreading presents inherent difficulties, especially in noisy environments or when contextual information is limited. The McGurk effect, where conflicting audio and visual cues lead to perceptual illusions, highlights the complexity of lipreading. Human lipreading performance varies widely, with hearing-impaired individuals achieving relatively low accuracy rates. Automating lipreading using machine learning techniques has emerged as a promising solution, with potential applications ranging from silent dictation in public spaces to biometric authentication systems. Visual speech recognition methods can be broadly categorized into those that focus on mimicking words and those that model visemes, visually distinguishable phonemes. While word-based approaches are suitable for isolated word recognition, viseme-based techniques are better suited for continuous speech recognition tasks. This study proposes a novel deep learning architecture for lipreading, leveraging Conv3D layers for spatiotemporal feature extraction and bidirectional LSTM layers for sequence modelling. The proposed model demonstrates significant improvements in lipreading accuracy, outperforming traditional methods on benchmark datasets. The practical implications of automated lipreading extend beyond accessibility technology to include biometric identity verification, security surveillance, and enhanced communication aids for individuals with hearing impairments. This paper provides insights into the advancements, challenges, and future directions of visual speech recognition research, paving the way for innovative applications in diverse domains.
Authors and Affiliations
Sankalp Kala, Prof. Sridhar Ranganathan,
Strengthened of Wooden Trusses Structure and Retrofitting Methods to Increase Load Capacity
Assessing after the strong earthquake, several wooden truss structures were damaged. The damage varies from dislocation of the truss mounts with their supports, displacement between the curtain rods and truss frames as w...
Optimizing Human Resource Allocation in Construction Projects: A Case Study
In construction, human resources are essential for workers to carry out tasks and work optimally from project initiation to completion. This entails having workers with the requisite skills and abilities for each job des...
Effect of Silica Fume Addition on Mechanical Properties of Concrete in Peat Swamp Water Environment
Hydraulic Portland cement, water, fine and coarse aggregate, and additional ingredients can be added or left out to create concrete. Making and maintaining concrete requires water as a reagent in the cement mixture so th...
Enhancing Performance Metrics: A Google Looker Studio Approach to Key Performance Indicator (KPI) Management System for Homecorp Offshore Drafting Team
The Homecorp Offshore Drafting Key Performance Indicator (KPI) Management System is a comprehensive process that aims to improve the management and monitoring of key performance indicators within the drafting department....
Improving the Functionality of the Electronic Health Record System through the Development of the Anesthesia Module
Background: With the advancement of healthcare technology, the use of Electronic Health Records (EHR) has become more effective, providing benefits such as cost reduction, improved healthcare quality, and enhanced data r...