Enhancing Image Captioning and Auto-Tagging Through a FCLN with Faster R-CNN Integration

Journal Title: Information Dynamics and Applications - Year 2024, Vol 3, Issue 1

Abstract

In the realm of automated image captioning, which entails generating descriptive text for images, the fusion of Natural Language Processing (NLP) and computer vision techniques is paramount. This study introduces the Fully Convolutional Localization Network (FCLN), a novel approach that concurrently addresses localization and description tasks within a singular forward pass. It maintains spatial information and avoids detail loss, streamlining the training process with consistent optimization. The foundation of FCLN is laid by a Convolutional Neural Network (CNN), adept at extracting salient image features. Central to this architecture is a Localization Layer, pivotal in precise object detection and caption generation. The FCLN architecture amalgamates a region detection network, reminiscent of Faster Region-CNN (R-CNN), with a captioning network. This synergy enables the production of contextually meaningful image captions. The incorporation of the Faster R-CNN framework facilitates region-based object detection, offering precise contextual understanding and inter-object relationships. Concurrently, a Long Short-Term Memory (LSTM) network is employed for generating captions. This integration yields superior performance in caption accuracy, particularly in complex scenes. Evaluations conducted on the Microsoft Common Objects in Context (MS COCO) test server affirm the model's superiority over existing benchmarks, underscoring its efficacy in generating precise and context-rich image captions.

Authors and Affiliations

Shalaka Prasad Deore, Taibah Sohail Bagwan, Prachiti Sunil Bhukan, Harsheen Tejindersingh Rajpal, Shantanu Bharat Gade

Keywords

Related Articles

A Deep Convolutional Neural Network Framework for Enhancing Brain Tumor Diagnosis on MRI Scans

Brain tumors are a critical public health concern, often resulting in limited life expectancy for patients. Accurate diagnosis of brain tumors is crucial to develop effective treatment strategies and improve patients' qu...

Enhancing Healthcare Data Security in IoT Environments Using Blockchain and DCGRU with Twofish Encryption

In the rapidly evolving landscape of digital healthcare, the integration of cloud computing, Internet of Things (IoT), and advanced computational methodologies such as machine learning and artificial intelligence (AI) ha...

The Need to Improve DNS Security Architecture: An Adaptive Security Approach

The Domain Name System (DNS) is an essential component of the internet infrastructure. Due to its importance, securing DNS becomes a necessity for current and future networks. Various DNS security architecture have been...

Cryptocurrency Investigations in Digital Forensics: Contemporary Challenges and Methodological Advances

Digital forensics, a crucial subset of cybersecurity, encompasses sophisticated tools and methodologies for the interpretation, analysis, and investigation of digital evidence, facilitating the identification and mitigat...

Examining Public Perceptions of UK Rail Strikes: A Text Analytics Approach Using Twitter Data

Social media, particularly Twitter, has emerged as a vital platform for understanding public opinion on contemporary issues. This study investigates public attitudes towards UK rail strikes by analyzing Twitter data and...

Download PDF file
  • EP ID EP732671
  • DOI https://doi.org/10.56578/ida030102
  • Views 59
  • Downloads 0

How To Cite

Shalaka Prasad Deore, Taibah Sohail Bagwan, Prachiti Sunil Bhukan, Harsheen Tejindersingh Rajpal, Shantanu Bharat Gade (2024). Enhancing Image Captioning and Auto-Tagging Through a FCLN with Faster R-CNN Integration. Information Dynamics and Applications, 3(1), -. https://europub.co.uk/articles/-A-732671