Enhancing the performance of distributed big data processing systems using Hadoop and Polybase

Abstract

<p>The approach to improvement of performance of distributed information systems based on sharing technologies of the Hadoop cluster and component of SQL Server PolyBase was considered. It was shown that the relevance of the problem, solved in the research, relates to the need for processing Big Data with different way of representation, in accordance with solving diverse problems of business projects. An analysis of methods and technologies of creation of hybrid data warehouses based on different data of SQL and NoSQL types was performed. It was shown that at present, the most common is the technology of Big Data processing with the use of Hadoop distributed computation environment. The existing technologies of organization and access to the data in the Hadoop cluster with SQL-like DBMS by using connectors were analyzed. The comparative quantitative estimates of using Hive and Sqoop connectors during exporting data to the Hadoop warehouse were presented. An analysis of special features of Big Data processing in the architecture of Hadoop-based distributed cluster computations was carried out. The features of Polybase technology as a component of SQL Server for organizing a bridge between SQL Server and Hadoop data of the SQL and NoSQL types were presented and described. The composition of the model computer plant based on the virtual machine for implementation of joint setting of PolyBase and Hadoop for solving test tasks was described. A methodological toolset for the installation and configuration of Hadoop and PolyBase SQL Server software was developed with consideration of constraints on computing capacities. Queries for using PolyBase and data warehouse Hadoop when processing Big Data were considered. To assess the performance of the system, absolute and relative metrics were proposed. For large volume of test data, the results of the experiments were presented and analyzed, which illustrated an increase in productivity of the distributed information system – query execution time and magnitude of memory capacity of temporary tables, created in this case. A comparative analysis of the studied technology with existing connectors with Hadoop cluster, which showed the advantage of PolyBase over connectors of Sqoop and Hive was performed. The results of the research could be used in the course of scientific and training experiments of organization when implementing the most modern IT-technologies.</p>

Authors and Affiliations

Sergii Minukhin, Victor Fedko, Yurii Gnusov

Keywords

Related Articles

Analysis of influence of technical features of a pid­controller implementation on the dynamics of automated control system

<p>Under conditions of intensification and maximization of production profitability, a problem of regulation, optimization and improvement of the structure of automatic control systems arises. To date, there are many top...

The ADALINE neuron modification for solving the problem on searching for the reusable functions of the information system

<p>The problem of reducing costs in developing information systems and software products was considered. It was proposed to replace the IT project staff in a number of repeatable processes and the works connected with de...

Improvement of a discharge nozzle damping attachment to suppress fires of class D

<p>The software package COSMOSFloWorks has been used to study a discharge nozzle damping attachment. A procedure has been proposed to estimate the covering of surface with dimensions of 0.4×0.4 m by a fire-extinguishing...

Experimental investigation of the fire­extinguishing system with a gas­detonation charge for fluid acceleration

<p>To improve the parameters of pulsed fire-extinguishing plants for long-range and mass and dimensional indicators, it was proposed to replace the pneumatic propellant charge with the gas-detonation charge. The charge i...

A study of the effect of thermotropic polysaccharides on the properties of the alginate­calcium shell of an encapsulated fatty semifinished food product

<div><p>Numerous tests have determined that the use of AlgNa, as a polysaccharide capable of ionotropic gelation, in the technology of capsular products is a promising direction. Owing to the ability of AlgNa to interact...

Download PDF file
  • EP ID EP528054
  • DOI 10.15587/1729-4061.2018.139630
  • Views 108
  • Downloads 0

How To Cite

Sergii Minukhin, Victor Fedko, Yurii Gnusov (2018). Enhancing the performance of distributed big data processing systems using Hadoop and Polybase. Восточно-Европейский журнал передовых технологий, 4(2), 16-28. https://europub.co.uk/articles/-A-528054