A Scalable Framework to Analyze Data from Heterogeneous Sources at Different Levels of Granularity

Journal Title: Information Dynamics and Applications - Year 2022, Vol 1, Issue 1

Abstract

There is an enormous amount of data present in many different formats, including databases (MsSql, MySQL, etc.), data repositories (.txt, html, pdf, etc.), and MongoDB (NoSQL, etc.). The processing, storing, and management of the data are complicated by the varied locations in which the data is stored. If combined, this data from several sites can yield a lot of important information. Since many researchers have suggested different methods to extract, examine, and integrate the data. To manage heterogeneous data, researchers propose data warehouse and big data as solutions. However, when it comes to handling a variety of data, each of these methods have limitations. It is necessary to comprehend and use this information, as well as to evaluate the massive quantities that are increasing day by day. We propose a solution that facilitates data extraction from a variety of sources. It involves two steps: first, it extracts the pertinent data, and second, then to identify the machine learning algorithm to analyze the data. This paper proposes a system for retrieving data from many sources, such as databases, data sources, and NoSQL. Later, the framework was put to the test on a variety of datasets to extract and integrate data from diverse sources, and it was found that the integrated dataset performed better than the individual datasets in terms of accuracy, management, storage, and other factors. Thus, our prototype scales and functions effectively as the number of heterogeneous data sources increases.

Authors and Affiliations

Iqbal Hasan, S. A. M. Rizvi, Majid Zaman, Waseem Jeelani Bakshi, Sheikh Amir Fayaz

Keywords

Related Articles

Classification of Cyclin Proteins Using Amino Acid Composition and an SVM Approach: An In-Depth Analysis

Cyclins, commonly referred to as co-enzymes, are a pivotal family of proteins that modulate cellular growth by activating cell-cycle mediators, proving essential for the cell cycle. Due to the marked dissimilarity in the...

Comparative Analysis of Machine Learning Algorithms for Daily Cryptocurrency Price Prediction

The decentralised nature of cryptocurrency, coupled with its potential for significant financial returns, has elevated its status as a sought-after investment opportunity on a global scale. Nonetheless, the inherent unpr...

An Integrated BERT-XGBoost Framework for Open-Source Intelligence Classification in Aerospace Technology

Open-source intelligence in aerospace technology often contains lengthy text and numerous technical terms, which can affect classification accuracy. To enhance the precision of classifying such intelligence, a classifica...

Enhancing Healthcare Data Security in IoT Environments Using Blockchain and DCGRU with Twofish Encryption

In the rapidly evolving landscape of digital healthcare, the integration of cloud computing, Internet of Things (IoT), and advanced computational methodologies such as machine learning and artificial intelligence (AI) ha...

A Scalable Framework to Analyze Data from Heterogeneous Sources at Different Levels of Granularity

There is an enormous amount of data present in many different formats, including databases (MsSql, MySQL, etc.), data repositories (.txt, html, pdf, etc.), and MongoDB (NoSQL, etc.). The processing, storing, and manageme...

Download PDF file
  • EP ID EP732616
  • DOI https://doi.org/10.56578/ida010104
  • Views 66
  • Downloads 0

How To Cite

Iqbal Hasan, S. A. M. Rizvi, Majid Zaman, Waseem Jeelani Bakshi, Sheikh Amir Fayaz (2022). A Scalable Framework to Analyze Data from Heterogeneous Sources at Different Levels of Granularity. Information Dynamics and Applications, 1(1), -. https://europub.co.uk/articles/-A-732616