A Scalable Framework to Analyze Data from Heterogeneous Sources at Different Levels of Granularity
Journal Title: Information Dynamics and Applications - Year 2022, Vol 1, Issue 1
Abstract
There is an enormous amount of data present in many different formats, including databases (MsSql, MySQL, etc.), data repositories (.txt, html, pdf, etc.), and MongoDB (NoSQL, etc.). The processing, storing, and management of the data are complicated by the varied locations in which the data is stored. If combined, this data from several sites can yield a lot of important information. Since many researchers have suggested different methods to extract, examine, and integrate the data. To manage heterogeneous data, researchers propose data warehouse and big data as solutions. However, when it comes to handling a variety of data, each of these methods have limitations. It is necessary to comprehend and use this information, as well as to evaluate the massive quantities that are increasing day by day. We propose a solution that facilitates data extraction from a variety of sources. It involves two steps: first, it extracts the pertinent data, and second, then to identify the machine learning algorithm to analyze the data. This paper proposes a system for retrieving data from many sources, such as databases, data sources, and NoSQL. Later, the framework was put to the test on a variety of datasets to extract and integrate data from diverse sources, and it was found that the integrated dataset performed better than the individual datasets in terms of accuracy, management, storage, and other factors. Thus, our prototype scales and functions effectively as the number of heterogeneous data sources increases.
Authors and Affiliations
Iqbal Hasan, S. A. M. Rizvi, Majid Zaman, Waseem Jeelani Bakshi, Sheikh Amir Fayaz
Classification of Cyclin Proteins Using Amino Acid Composition and an SVM Approach: An In-Depth Analysis
Cyclins, commonly referred to as co-enzymes, are a pivotal family of proteins that modulate cellular growth by activating cell-cycle mediators, proving essential for the cell cycle. Due to the marked dissimilarity in the...
Comparative Analysis of Machine Learning Algorithms for Daily Cryptocurrency Price Prediction
The decentralised nature of cryptocurrency, coupled with its potential for significant financial returns, has elevated its status as a sought-after investment opportunity on a global scale. Nonetheless, the inherent unpr...
An Integrated BERT-XGBoost Framework for Open-Source Intelligence Classification in Aerospace Technology
Open-source intelligence in aerospace technology often contains lengthy text and numerous technical terms, which can affect classification accuracy. To enhance the precision of classifying such intelligence, a classifica...
Enhancing Healthcare Data Security in IoT Environments Using Blockchain and DCGRU with Twofish Encryption
In the rapidly evolving landscape of digital healthcare, the integration of cloud computing, Internet of Things (IoT), and advanced computational methodologies such as machine learning and artificial intelligence (AI) ha...
A Scalable Framework to Analyze Data from Heterogeneous Sources at Different Levels of Granularity
There is an enormous amount of data present in many different formats, including databases (MsSql, MySQL, etc.), data repositories (.txt, html, pdf, etc.), and MongoDB (NoSQL, etc.). The processing, storing, and manageme...