Deep Learning Based Lipreading for Video Captioning
Journal Title: Engineering and Technology Journal - Year 2024, Vol 9, Issue 05
Abstract
Visual speech recognition, often referred to as lipreading, has garnered significant attention in recent years due to its potential applications in various fields such as human-computer interaction, accessibility technology, and biometric security systems. This paper explores the challenges and advancements in the field of lipreading, which involves deciphering speech from visual cues, primarily movements of the lips, tongue, and teeth. Despite being an essential aspect of human communication, lipreading presents inherent difficulties, especially in noisy environments or when contextual information is limited. The McGurk effect, where conflicting audio and visual cues lead to perceptual illusions, highlights the complexity of lipreading. Human lipreading performance varies widely, with hearing-impaired individuals achieving relatively low accuracy rates. Automating lipreading using machine learning techniques has emerged as a promising solution, with potential applications ranging from silent dictation in public spaces to biometric authentication systems. Visual speech recognition methods can be broadly categorized into those that focus on mimicking words and those that model visemes, visually distinguishable phonemes. While word-based approaches are suitable for isolated word recognition, viseme-based techniques are better suited for continuous speech recognition tasks. This study proposes a novel deep learning architecture for lipreading, leveraging Conv3D layers for spatiotemporal feature extraction and bidirectional LSTM layers for sequence modelling. The proposed model demonstrates significant improvements in lipreading accuracy, outperforming traditional methods on benchmark datasets. The practical implications of automated lipreading extend beyond accessibility technology to include biometric identity verification, security surveillance, and enhanced communication aids for individuals with hearing impairments. This paper provides insights into the advancements, challenges, and future directions of visual speech recognition research, paving the way for innovative applications in diverse domains.
Authors and Affiliations
Sankalp Kala, Prof. Sridhar Ranganathan,
Design of Wearable Multiband Circular Microstrip Textile Antenna for WiFi/WiMAX Communication
In proposed design the wearable circular microstrip antenna of radius of patch is 14 mm and the top of patch consist of two square slits of dimensions 5x5 mm2 and 10x10mm2 and the ground structure is made partial of 28mm...
Revolutionizing Computational Efficiency: A Comprehensive Analysis of Virtual Machine Optimization Strategies
This study undertakes a systematic exploration of contemporary virtual machine optimization strategies, aiming to unravel the intricate dynamics that shape computational efficiency in virtualized environments. The resear...
Design and Construction of Gas Fire Alert System
This research paper aspired to design and develop a gas leakage and fire monitoring and detection alarm system based on SMS. Most fire incidents in houses, schools and factories are caused by gas leakage. Many houses and...
Chaos-Based NMPC as a Novel MPPT Approach for Organıc Photovoltaıcs
Environmental degradation, alongside the dire consequences of climate change such as the melting of glaciers and rising sea levels, highlight the severe challenges associated with reliance on fossil fuels. As such, the s...
Optimization Simulation of Different Injection and Production Parameters of Enhanced Geothermal Systems
Enhanced Geothermal System (EGS), as a key technology for the extraction and utilization of geothermal energy in deep strata and high temperature rock mass, has a broad application prospect in the field of energy. In ord...