Integrating R and Hadoop for Big Data Analysis

Journal Title: Revista Romana de Statistica - Year 2014, Vol 62, Issue 2

Abstract

Analyzing and working with big data could be very difficult using classical means like relational database management systems or desktop software packages for statistics and visualization. Instead, big data requires large clusters with hundreds or even thousands of computing nodes. Official statistics is increasingly considering big data for deriving new statistics because big data sources could produce more relevant and timely statistics than traditional sources. One of the software tools successfully and wide spread used for storage and processing of big data sets on clusters of commodity hardware is Hadoop. Hadoop framework contains libraries, a distributed file-system (HDFS), a resource-management platform and implements a version of the MapReduce programming model for large scale data processing. In this paper we investigate the possibilities of integrating Hadoop with R which is a popular software used for statistical computing and data visualization. We present three ways of integrating them: R with Streaming, Rhipe and RHadoop and we emphasize the advantages and disadvantages of each solution.

Authors and Affiliations

Bogdan Oancea, Raluca Mariana Dragoescu

Keywords

Related Articles

Stabilirea preţului de piaţă

Acest articol prezintă unele considerente referitoare la modelarea preţului de piaţă. Sunt tratate pieţele competitive pentru garanţiile Arrow-Debreu, teorema bunăstării economice, prima de capital, modelul de capital al...

A Statistical Applied Method, Drawing on the Consumer Price Index and its Investigative Qualities

This paper describes the importance and utility of the CPI indices, from the general one to the detailed indices for food products, non – food products and for services and propose a statistical method based on the elast...

Analysis of the Evolution of Gross Domestic Product by Categories of Users

The objective of this article is to analyze the evolution, the dynamics of the gross domestic product of Romania during 2003 – 2014 as well as its uses, amid the global financial and economic crisis effects, the results...

DEPENDENCE OF COUNTRY RISK COMPARED TO THE FOREIGN DEBT LEVEL

The article presents some of the fundamental aspects of country risk’ dependence compared to foreign debt level. Starting from external debt burden we analyze the usage of foreign loans, foreign debt bearing capacity as...

The Economic Dimension Of Environmental Risk Management in Knowledge-Based Society

Environmental risk for the majority of companies is the deterioration of bottom-line performance from: increased regulation on energy usage, eroded reputation, brand name and market share from an environmental incident,...

Download PDF file
  • EP ID EP94539
  • DOI -
  • Views 213
  • Downloads 0

How To Cite

Bogdan Oancea, Raluca Mariana Dragoescu (2014). Integrating R and Hadoop for Big Data Analysis. Revista Romana de Statistica, 62(2), 83-94. https://europub.co.uk/articles/-A-94539