Deep Learning Based Lipreading for Video Captioning
Journal Title: Engineering and Technology Journal - Year 2024, Vol 9, Issue 05
Abstract
Visual speech recognition, often referred to as lipreading, has garnered significant attention in recent years due to its potential applications in various fields such as human-computer interaction, accessibility technology, and biometric security systems. This paper explores the challenges and advancements in the field of lipreading, which involves deciphering speech from visual cues, primarily movements of the lips, tongue, and teeth. Despite being an essential aspect of human communication, lipreading presents inherent difficulties, especially in noisy environments or when contextual information is limited. The McGurk effect, where conflicting audio and visual cues lead to perceptual illusions, highlights the complexity of lipreading. Human lipreading performance varies widely, with hearing-impaired individuals achieving relatively low accuracy rates. Automating lipreading using machine learning techniques has emerged as a promising solution, with potential applications ranging from silent dictation in public spaces to biometric authentication systems. Visual speech recognition methods can be broadly categorized into those that focus on mimicking words and those that model visemes, visually distinguishable phonemes. While word-based approaches are suitable for isolated word recognition, viseme-based techniques are better suited for continuous speech recognition tasks. This study proposes a novel deep learning architecture for lipreading, leveraging Conv3D layers for spatiotemporal feature extraction and bidirectional LSTM layers for sequence modelling. The proposed model demonstrates significant improvements in lipreading accuracy, outperforming traditional methods on benchmark datasets. The practical implications of automated lipreading extend beyond accessibility technology to include biometric identity verification, security surveillance, and enhanced communication aids for individuals with hearing impairments. This paper provides insights into the advancements, challenges, and future directions of visual speech recognition research, paving the way for innovative applications in diverse domains.
Authors and Affiliations
Sankalp Kala, Prof. Sridhar Ranganathan,
Harmony in Nodes: Exploring Efficiency and Resilience in Distributed Systems
This research, titled "Harmony in Nodes: Exploring Efficiency and Resilience in Distributed Systems," delves into the intricate dynamics of distributed systems, aiming to uncover the delicate balance required for optimal...
A NEEDS ASSESSMENT TO EXPLORE THE FEASIBILITY OF A UNIFIED EVALUATION FRAMEWORK FOR DIVERSE JOB SPECIALIZATIONS IN THE JOII WORKSTREAM
This study explores the possibility and potential benefits of creating a standardized evaluation framework for various job specializations within Joii Workstream. Joii Workstream is a business process outsourcing (BPO) c...
Access and Management of Electronic Information Resources in Umaru Musa Yar’adua University Library, Katsina State Nigeria
The study investigated the access and management of electronic information resources in Umaru Musa Yar’adua University Katsina, library. The study adopted quantitative approach as research paradigm; with survey as resear...
AI-Powered Fraud Detection in Auditing Using Machine Learning and Deep Learning Techniques
Financial fraud poses threats to the transparency and integrity of financial systems and therefore requires more advanced detection methods in auditing. This study proposes the application of artificial intelligence, i.e...
DATA PROCESSING PROCEDURE FOR DSRC PROBE-BASED ADVANCED TRAVELER INFORMATION SYSTEM ON SIGNALIZED ARTERIALS
When faced with traffic congestion on the road, drivers are eager to avoid it by diverting to a less congested route using real-life traffic information. To meet the demand from the public, advanced traveler information...