A Scalable Framework to Analyze Data from Heterogeneous Sources at Different Levels of Granularity

Journal Title: Information Dynamics and Applications - Year 2022, Vol 1, Issue 1

Abstract

There is an enormous amount of data present in many different formats, including databases (MsSql, MySQL, etc.), data repositories (.txt, html, pdf, etc.), and MongoDB (NoSQL, etc.). The processing, storing, and management of the data are complicated by the varied locations in which the data is stored. If combined, this data from several sites can yield a lot of important information. Since many researchers have suggested different methods to extract, examine, and integrate the data. To manage heterogeneous data, researchers propose data warehouse and big data as solutions. However, when it comes to handling a variety of data, each of these methods have limitations. It is necessary to comprehend and use this information, as well as to evaluate the massive quantities that are increasing day by day. We propose a solution that facilitates data extraction from a variety of sources. It involves two steps: first, it extracts the pertinent data, and second, then to identify the machine learning algorithm to analyze the data. This paper proposes a system for retrieving data from many sources, such as databases, data sources, and NoSQL. Later, the framework was put to the test on a variety of datasets to extract and integrate data from diverse sources, and it was found that the integrated dataset performed better than the individual datasets in terms of accuracy, management, storage, and other factors. Thus, our prototype scales and functions effectively as the number of heterogeneous data sources increases.

Authors and Affiliations

Iqbal Hasan, S. A. M. Rizvi, Majid Zaman, Waseem Jeelani Bakshi, Sheikh Amir Fayaz

Keywords

Related Articles

An IoT-Based Multimodal Real-Time Home Control System for the Physically Challenged: Design and Implementation

Physical impairments affect a significant proportion of the global populace, emphasizing the need for assistive technologies to increase the ability of these individuals to perform daily activities autonomously. This stu...

Ensemble Learning Applications in Multiple Industries: A Review

This study proposes a systematic review of the application of Ensemble learning (EL) in multiple industries. This study aims to review prevailing application in multiple industries to guide for the future landing applica...

A Cervical Lesion Recognition Method Based on ShuffleNetV2-CA

Cervical cancer is the second most common cancer among women globally. Colposcopy plays a vital role in assessing cervical intraepithelial neoplasia (CIN) and screening for cervical cancer. However, existing colposcopy m...

Optimizing Misinformation Control: A Cloud-Enhanced Machine Learning Approach

The digital age has witnessed the rampant spread of misinformation, significantly impacting the medical and financial sectors. This phenomenon, fueled by various sources, contributes to public distress and information wa...

A Comprehensive Review of Geographic Routing Protocols in Wireless Sensor Network

To analyses the impact of high mobility, dynamic topologies, scalability and routing due to the more dynamic changes in network. To enhance mobile Ad-hoc network (MANET) self-organization capabilities by geographical rou...

Download PDF file
  • EP ID EP732616
  • DOI https://doi.org/10.56578/ida010104
  • Views 19
  • Downloads 0

How To Cite

Iqbal Hasan, S. A. M. Rizvi, Majid Zaman, Waseem Jeelani Bakshi, Sheikh Amir Fayaz (2022). A Scalable Framework to Analyze Data from Heterogeneous Sources at Different Levels of Granularity. Information Dynamics and Applications, 1(1), -. https://europub.co.uk/articles/-A-732616