Integrating R and Hadoop for Big Data Analysis

Journal Title: Revista Romana de Statistica - Year 2014, Vol 62, Issue 2

Abstract

Analyzing and working with big data could be very difficult using classical means like relational database management systems or desktop software packages for statistics and visualization. Instead, big data requires large clusters with hundreds or even thousands of computing nodes. Official statistics is increasingly considering big data for deriving new statistics because big data sources could produce more relevant and timely statistics than traditional sources. One of the software tools successfully and wide spread used for storage and processing of big data sets on clusters of commodity hardware is Hadoop. Hadoop framework contains libraries, a distributed file-system (HDFS), a resource-management platform and implements a version of the MapReduce programming model for large scale data processing. In this paper we investigate the possibilities of integrating Hadoop with R which is a popular software used for statistical computing and data visualization. We present three ways of integrating them: R with Streaming, Rhipe and RHadoop and we emphasize the advantages and disadvantages of each solution.

Authors and Affiliations

Bogdan Oancea, Raluca Mariana Dragoescu

Keywords

Related Articles

Analytical and Numerical Models of Sandwich Panel taking into Account Wrinkling Phenomenon

The problem of local instability of the compressed facing of a sandwich panel is discussed in this paper. Proper estimation of wrinkling stress has become a challenging issue because of a strong tendency to optimize tech...

Model Analysis of the financial Performance of the Loan and the Borrower

In order to assess the financial performance of borrowers are using a number of financial indicators, specific to each bank, including: degree of liquidity, capital adequacy, leverage and profit rate. It is necessary to...

Analysis of the labour market in Romania in relation with working time

In this research study there were applied Regression models to examine the socio-economic factors that could influence the working time on the labor market. As a result of the regression models applied, the most signific...

The Activity in the Constructions and Transportation Fields

In the Romanian economy, the constructions sector occupies an important and significant place. As our economy is on a developing path, it is expected and normal that the results and weight of this activity is focused on...

Statistical methods manufacturing process control

This paper offers a short introspection on the use of the statistical control fiches for manufacturing processes, representing the parameters that are established under this purpose.

Download PDF file
  • EP ID EP94539
  • DOI -
  • Views 167
  • Downloads 0

How To Cite

Bogdan Oancea, Raluca Mariana Dragoescu (2014). Integrating R and Hadoop for Big Data Analysis. Revista Romana de Statistica, 62(2), 83-94. https://europub.co.uk/articles/-A-94539