A Close-Up View About Spark in Big Data Jurisdiction
Journal Title: International Journal of engineering Research and Applications - Year 2018, Vol 8, Issue 1
Abstract
The Big data is the name used ubiquitously now a day in distributed paradigm on the web. As the name point out it is the collection of sets of very large amounts of data in pet bytes, Exabyte etc. related systems as well as the algorithms used to analyze this enormous data. Hadoop technology as a big data processing technology has proven to be the go to solution for processing enormous data sets. MapReduce is a conspicuous solution for computations, which requirement one-pass to complete, but not exact efficient for use cases that need multi-pass for computations and algorithms. The Job output data between every stage has to be stored in the file system before the next stage can begin. Consequently, this method is slow, disk Input/output operations and due to replication. Additionally, Hadoop ecosystem doesn’t have every component to ending a big data use case. Suppose we want to do an iterative job, you would have to stitch together a sequence of MapReduce jobs and execute them in sequence. Every this job has high-latency, and each depends upon the completion of the previous stage. Apache Spark is one of the most widely used open source processing engines for big data, with wealthy language-integrated APIs and an extensive range of libraries. Apache Spark is a usual framework for distributed computing that offers high performance for both batch and interactive processing. In this paper, we aimed to demonstrate a close-up view about Apache Spark and its features and working with Spark using Hadoop. We are in a nutshell discussing about the Resilient Distributed Datasets (RDD), RDD operations, features, and limitation. Spark can be used along with MapReduce in the same Hadoop cluster or can be used lonely as a processing framework. In the last comparative analysis between Spark and Hadoop and MapReduce in this paper.
Authors and Affiliations
Firoj Parwej, Nikhat Akhtar, Dr. Yusuf Perwej
Determining the Rates for Scale Formation in Oil Wells
Scale deposition is one of the most serious oil field problems that affect water injection systems adversely, when two incompatible waters are involved. Two waters are incompatible if they interact chemically and precipi...
Design and Implementation of Perturb and Observation Maximum Power point Transfer (MPPT) algorithm for Photovoltaic system
The electricity crisis in India is still at large with over 300 million people still having no means to electricity. According to FICCI power shortages costs around 68 million in GDP. India is endowed with a vast solar e...
Identification of Ascomycetes Recovered From Petrol Stations in the Metropolitan Region of João Pessoa-PB, Brazil
Petroderivatives from petrol stations constantly contaminate the urban environment and are responsible for significant negative impacts on it. Fungi are one of the main groups responsible for the degradation process of t...
Design of A High Speed And Low Power 4 Bit Carry Skip Adder
This paper focuses on carry skip adder (CSKA) structure that has a higher speed yet lower power consumption compared with the conventional one. The speed enhancement is achieved by applying Transmission gate logic (TG) t...
Analysis Factor of Delays Construction Building of Dinas Cipta Karya Tata Kota and Bina Marga in Samarinda
The construction project will achieve success if the parties were able complete the construction project accordance schedule, safety, cost and quality, heve been established. Many cases delays construction building in Sa...