Enhancing Image Captioning and Auto-Tagging Through a FCLN with Faster R-CNN Integration

Journal Title: Information Dynamics and Applications - Year 2024, Vol 3, Issue 1

Abstract

In the realm of automated image captioning, which entails generating descriptive text for images, the fusion of Natural Language Processing (NLP) and computer vision techniques is paramount. This study introduces the Fully Convolutional Localization Network (FCLN), a novel approach that concurrently addresses localization and description tasks within a singular forward pass. It maintains spatial information and avoids detail loss, streamlining the training process with consistent optimization. The foundation of FCLN is laid by a Convolutional Neural Network (CNN), adept at extracting salient image features. Central to this architecture is a Localization Layer, pivotal in precise object detection and caption generation. The FCLN architecture amalgamates a region detection network, reminiscent of Faster Region-CNN (R-CNN), with a captioning network. This synergy enables the production of contextually meaningful image captions. The incorporation of the Faster R-CNN framework facilitates region-based object detection, offering precise contextual understanding and inter-object relationships. Concurrently, a Long Short-Term Memory (LSTM) network is employed for generating captions. This integration yields superior performance in caption accuracy, particularly in complex scenes. Evaluations conducted on the Microsoft Common Objects in Context (MS COCO) test server affirm the model's superiority over existing benchmarks, underscoring its efficacy in generating precise and context-rich image captions.

Authors and Affiliations

Shalaka Prasad Deore, Taibah Sohail Bagwan, Prachiti Sunil Bhukan, Harsheen Tejindersingh Rajpal, Shantanu Bharat Gade

Keywords

Related Articles

Multi-Channel Scheduling for Short-Range Wireless Communication Networks Using a Q-Learning Feedback Mechanism

The traditional channel scheduling methods in short-range wireless communication networks are often constrained by fixed rules, resulting in inefficient channel resource utilization and unstable data communication. To...

Optimizing Energy Storage and Hybrid Inverter Performance in Smart Grids Through Machine Learning

The effective integration of renewable energy sources (RES), such as solar and wind power, into smart grids is essential for advancing sustainable energy management. Hybrid inverters play a pivotal role in the conversio...

Enhancing Data Storage and Access in CSN Labs with Raspberry Pi 3B+ and Open Media Vault NAS

The purpose of this study was to devise a more efficient system for data storage and exchange in the Computer System and Network (CSN) Laboratory at Ibn Khaldun Bogor University. Open Media Vault (OMV) software and Raspb...

An IoT-Based Multimodal Real-Time Home Control System for the Physically Challenged: Design and Implementation

Physical impairments affect a significant proportion of the global populace, emphasizing the need for assistive technologies to increase the ability of these individuals to perform daily activities autonomously. This stu...

Extraction of Judgment Elements from Legal Instruments Using an Attention Mechanism-Based RCNN Fusion Model

In the field of jurisprudence, judgment element extraction has become a crucial aspect of legal judgment prediction research. The introduction of pre-trained language models has provided significant momentum for the adva...

Download PDF file
  • EP ID EP732671
  • DOI https://doi.org/10.56578/ida030102
  • Views 60
  • Downloads 0

How To Cite

Shalaka Prasad Deore, Taibah Sohail Bagwan, Prachiti Sunil Bhukan, Harsheen Tejindersingh Rajpal, Shantanu Bharat Gade (2024). Enhancing Image Captioning and Auto-Tagging Through a FCLN with Faster R-CNN Integration. Information Dynamics and Applications, 3(1), -. https://europub.co.uk/articles/-A-732671