A Close-Up View About Spark in Big Data Jurisdiction

Abstract

The Big data is the name used ubiquitously now a day in distributed paradigm on the web. As the name point out it is the collection of sets of very large amounts of data in pet bytes, Exabyte etc. related systems as well as the algorithms used to analyze this enormous data. Hadoop technology as a big data processing technology has proven to be the go to solution for processing enormous data sets. MapReduce is a conspicuous solution for computations, which requirement one-pass to complete, but not exact efficient for use cases that need multi-pass for computations and algorithms. The Job output data between every stage has to be stored in the file system before the next stage can begin. Consequently, this method is slow, disk Input/output operations and due to replication. Additionally, Hadoop ecosystem doesn’t have every component to ending a big data use case. Suppose we want to do an iterative job, you would have to stitch together a sequence of MapReduce jobs and execute them in sequence. Every this job has high-latency, and each depends upon the completion of the previous stage. Apache Spark is one of the most widely used open source processing engines for big data, with wealthy language-integrated APIs and an extensive range of libraries. Apache Spark is a usual framework for distributed computing that offers high performance for both batch and interactive processing. In this paper, we aimed to demonstrate a close-up view about Apache Spark and its features and working with Spark using Hadoop. We are in a nutshell discussing about the Resilient Distributed Datasets (RDD), RDD operations, features, and limitation. Spark can be used along with MapReduce in the same Hadoop cluster or can be used lonely as a processing framework. In the last comparative analysis between Spark and Hadoop and MapReduce in this paper.

Authors and Affiliations

Firoj Parwej, Nikhat Akhtar, Dr. Yusuf Perwej

Keywords

Related Articles

Novel Design and Implementation of Graph Mining for Big Data Network Analysis

The Research entitled Application of Graph Theory to Big Networks for Big Data” is an innovative idea having novel design and exemplar implementations on various case studies. Graph Mining strategies can be applied for B...

Analysis of the Noncompliance Factor against Regulation Legislation and Its Effect on the Quality of Financial Reporting (The empirical Study at the Provincial / District / City All Indonesia)

This research was conducted with the background of the phenomena that occur under BPKRI audit that the poor quality of financial reporting in the government particularly provincial and city / district characterized by po...

Microstructure and phase transformation of nearly equiatomic Ni-Ti binary shape memory alloy

The phase transformation and microstructure behavior of Ni-Ti shape memory alloy was investigated by scanning electronic microscope, X-ray diffraction and differential scanning calorimetry. The results showed that the mi...

Communication over Ad Hoc Networks under Quality of Services Constraints: A Review

It is quite challenging to maintain the quality of wireless transmission over mobile ad hoc networks due to various factors i.e. error prone channel condition, behavior of intermediate layers, node’s mobility, low radio...

Automated Irrigation System using WSN and Wi-Fi Module

The objective of the developed system is to encourage the efficient water management practices that optimize the usage of water by keeping the crop health and yield intact through the implementation of automated irrigati...

Download PDF file
  • EP ID EP393365
  • DOI 10.9790/9622-0801022641.
  • Views 73
  • Downloads 0

How To Cite

Firoj Parwej, Nikhat Akhtar, Dr. Yusuf Perwej (2018). A Close-Up View About Spark in Big Data Jurisdiction. International Journal of engineering Research and Applications, 8(1), 26-41. https://europub.co.uk/articles/-A-393365