A Close-Up View About Spark in Big Data Jurisdiction

Apply

A Close-Up View About Spark in Big Data Jurisdiction

Journal Title: International Journal of engineering Research and Applications - Year 2018, Vol 8, Issue 1

Abstract

The Big data is the name used ubiquitously now a day in distributed paradigm on the web. As the name point out it is the collection of sets of very large amounts of data in pet bytes, Exabyte etc. related systems as well as the algorithms used to analyze this enormous data. Hadoop technology as a big data processing technology has proven to be the go to solution for processing enormous data sets. MapReduce is a conspicuous solution for computations, which requirement one-pass to complete, but not exact efficient for use cases that need multi-pass for computations and algorithms. The Job output data between every stage has to be stored in the file system before the next stage can begin. Consequently, this method is slow, disk Input/output operations and due to replication. Additionally, Hadoop ecosystem doesn’t have every component to ending a big data use case. Suppose we want to do an iterative job, you would have to stitch together a sequence of MapReduce jobs and execute them in sequence. Every this job has high-latency, and each depends upon the completion of the previous stage. Apache Spark is one of the most widely used open source processing engines for big data, with wealthy language-integrated APIs and an extensive range of libraries. Apache Spark is a usual framework for distributed computing that offers high performance for both batch and interactive processing. In this paper, we aimed to demonstrate a close-up view about Apache Spark and its features and working with Spark using Hadoop. We are in a nutshell discussing about the Resilient Distributed Datasets (RDD), RDD operations, features, and limitation. Spark can be used along with MapReduce in the same Hadoop cluster or can be used lonely as a processing framework. In the last comparative analysis between Spark and Hadoop and MapReduce in this paper.

Authors and Affiliations

Firoj Parwej, Nikhat Akhtar, Dr. Yusuf Perwej

Keywords

Determining the Rates for Scale Formation in Oil Wells

Scale deposition is one of the most serious oil field problems that affect water injection systems adversely, when two incompatible waters are involved. Two waters are incompatible if they interact chemically and precipi...

Design and Implementation of Perturb and Observation Maximum Power point Transfer (MPPT) algorithm for Photovoltaic system

The electricity crisis in India is still at large with over 300 million people still having no means to electricity. According to FICCI power shortages costs around 68 million in GDP. India is endowed with a vast solar e...

EP ID EP393365
DOI 10.9790/9622-0801022641.
Views 89
Downloads 0