Enhancing Image Captioning and Auto-Tagging Through a FCLN with Faster R-CNN Integration

Journal Title: Information Dynamics and Applications - Year 2024, Vol 3, Issue 1

Abstract

In the realm of automated image captioning, which entails generating descriptive text for images, the fusion of Natural Language Processing (NLP) and computer vision techniques is paramount. This study introduces the Fully Convolutional Localization Network (FCLN), a novel approach that concurrently addresses localization and description tasks within a singular forward pass. It maintains spatial information and avoids detail loss, streamlining the training process with consistent optimization. The foundation of FCLN is laid by a Convolutional Neural Network (CNN), adept at extracting salient image features. Central to this architecture is a Localization Layer, pivotal in precise object detection and caption generation. The FCLN architecture amalgamates a region detection network, reminiscent of Faster Region-CNN (R-CNN), with a captioning network. This synergy enables the production of contextually meaningful image captions. The incorporation of the Faster R-CNN framework facilitates region-based object detection, offering precise contextual understanding and inter-object relationships. Concurrently, a Long Short-Term Memory (LSTM) network is employed for generating captions. This integration yields superior performance in caption accuracy, particularly in complex scenes. Evaluations conducted on the Microsoft Common Objects in Context (MS COCO) test server affirm the model's superiority over existing benchmarks, underscoring its efficacy in generating precise and context-rich image captions.

Authors and Affiliations

Shalaka Prasad Deore, Taibah Sohail Bagwan, Prachiti Sunil Bhukan, Harsheen Tejindersingh Rajpal, Shantanu Bharat Gade

Keywords

Related Articles

A Data-Driven Innovation Model of Big Data Digital Learning and Its Empirical Study

Digital learning is the use of telecommunication technology to deliver information for education and training. As the increased acceleration of the propagation speed of the web, a lot of data collected by automated or se...

A Comparative Review of Internet of Things Model Workload Distribution Techniques in Fog Computing Networks

In the realm of fog computing (FC), a vast array of intelligent devices collaborates within an intricate network, a synergy that, while promising, has not been without its challenges. These challenges, including data los...

Routing Attack Detection Using Ensemble Deep Learning Model for IIoT

Smart cities, ITS, supply chains, and smart industries may all be developed with minimal human interaction thanks to the increasing prevalence of automation enabled by machine-type communication (MTC). Yet, MTC has subst...

Optimizing Energy Storage and Hybrid Inverter Performance in Smart Grids Through Machine Learning

The effective integration of renewable energy sources (RES), such as solar and wind power, into smart grids is essential for advancing sustainable energy management. Hybrid inverters play a pivotal role in the conversio...

ECO-LEACH: A Blockchain-Based Distributed Routing Protocol for Energy-Efficient Wireless Sensor Networks

This paper proposes a novel architecture based on blockchain technology to enhance the dependability and safety of wireless sensor networks (WSN) by authenticating WSN nodes. In a WSN, sensor nodes collect and transmit d...

Download PDF file
  • EP ID EP732671
  • DOI https://doi.org/10.56578/ida030102
  • Views 33
  • Downloads 0

How To Cite

Shalaka Prasad Deore, Taibah Sohail Bagwan, Prachiti Sunil Bhukan, Harsheen Tejindersingh Rajpal, Shantanu Bharat Gade (2024). Enhancing Image Captioning and Auto-Tagging Through a FCLN with Faster R-CNN Integration. Information Dynamics and Applications, 3(1), -. https://europub.co.uk/articles/-A-732671