Enhancing Image Captioning and Auto-Tagging Through a FCLN with Faster R-CNN Integration

Journal Title: Information Dynamics and Applications - Year 2024, Vol 3, Issue 1

Abstract

In the realm of automated image captioning, which entails generating descriptive text for images, the fusion of Natural Language Processing (NLP) and computer vision techniques is paramount. This study introduces the Fully Convolutional Localization Network (FCLN), a novel approach that concurrently addresses localization and description tasks within a singular forward pass. It maintains spatial information and avoids detail loss, streamlining the training process with consistent optimization. The foundation of FCLN is laid by a Convolutional Neural Network (CNN), adept at extracting salient image features. Central to this architecture is a Localization Layer, pivotal in precise object detection and caption generation. The FCLN architecture amalgamates a region detection network, reminiscent of Faster Region-CNN (R-CNN), with a captioning network. This synergy enables the production of contextually meaningful image captions. The incorporation of the Faster R-CNN framework facilitates region-based object detection, offering precise contextual understanding and inter-object relationships. Concurrently, a Long Short-Term Memory (LSTM) network is employed for generating captions. This integration yields superior performance in caption accuracy, particularly in complex scenes. Evaluations conducted on the Microsoft Common Objects in Context (MS COCO) test server affirm the model's superiority over existing benchmarks, underscoring its efficacy in generating precise and context-rich image captions.

Authors and Affiliations

Shalaka Prasad Deore, Taibah Sohail Bagwan, Prachiti Sunil Bhukan, Harsheen Tejindersingh Rajpal, Shantanu Bharat Gade

Keywords

Related Articles

FEGAO: A Revolutionary Method for Enhancing Defective Fuzzy Images with Non-Linear Refinement

This study presents a novel image restoration method, designed to enhance defective fuzzy images, by utilizing the Fuzzy Einstein Geometric Aggregation Operator (FEGAO). The method addresses the challenges posed by non-l...

A Data-Driven Innovation Model of Big Data Digital Learning and Its Empirical Study

Digital learning is the use of telecommunication technology to deliver information for education and training. As the increased acceleration of the propagation speed of the web, a lot of data collected by automated or se...

DV-Hop Positioning Method Based on Multi-Strategy Improved Sparrow Search Algorithm

In order to address the problem of large positioning errors in non-ranging positioning algorithms for wireless sensor networks (WSN), this study proposes a Distance Vector-Hop (DV-Hop) positioning method based on the m...

A Scalable Framework to Analyze Data from Heterogeneous Sources at Different Levels of Granularity

There is an enormous amount of data present in many different formats, including databases (MsSql, MySQL, etc.), data repositories (.txt, html, pdf, etc.), and MongoDB (NoSQL, etc.). The processing, storing, and manageme...

A Deep Convolutional Neural Network Framework for Enhancing Brain Tumor Diagnosis on MRI Scans

Brain tumors are a critical public health concern, often resulting in limited life expectancy for patients. Accurate diagnosis of brain tumors is crucial to develop effective treatment strategies and improve patients' qu...

Download PDF file
  • EP ID EP732671
  • DOI https://doi.org/10.56578/ida030102
  • Views 46
  • Downloads 0

How To Cite

Shalaka Prasad Deore, Taibah Sohail Bagwan, Prachiti Sunil Bhukan, Harsheen Tejindersingh Rajpal, Shantanu Bharat Gade (2024). Enhancing Image Captioning and Auto-Tagging Through a FCLN with Faster R-CNN Integration. Information Dynamics and Applications, 3(1), -. https://europub.co.uk/articles/-A-732671