Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics

Abstract

The goal of big data analytics is to analyze datasets with a higher amount of volume, velocity, and variety for large-scale business intelligence problems. These workloads are normally processed with the distribution on massively parallel analytical systems. Deep learning is part of a broader family of machine learning methods based on learning representations of data. Deep learning plays a significant role in the information analysis by adding value to the massive amount of unsupervised data. A core domain of research is related to the development of deep learning algorithms for auto-extraction of complex data formats at a higher level of abstraction using the massive volumes of data. In this paper, we present the latest research trends in the development of parallel algorithms, optimization techniques, tools and libraries related to big data analytics and deep learning on various parallel architectures. The basic building blocks for deep learning such as Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) are identified and analyzed for parallelization of deep learning models. We proposed a parallel software API based on PyTorch, Hadoop Distributed File System (HDFS), Apache Hadoop MapReduce and MapReduce Job (MRJob) for developing large-scale deep learning models. We obtained about 5-30% reduction in the execution time of the deep auto-encoder model even on a single node Hadoop cluster. Furthermore, the complexity of code development is significantly reduced to create multi-layer deep learning models.

Authors and Affiliations

Ayaz H. Khan, Ali Mustafa Qamar, Aneeq Yusuf, Rehanullah Khan

Keywords

Related Articles

Challenges of Future R&D in Mobile Communications

This paper provides a survey for the main challenges of future research and development (R&D) for next generation mobile networks (NGNs). It addresses software and hardware re-configurability with focus on reconfigur...

Vicarious Calibration Based Cross Calibration of Solar Reflective Channels of Radiometers Onboard Remote Sensing Satellite and Evaluation of Cross Calibration Accuracy through Band-to-Band Data Comparisons

Accuracy evaluation of cross calibration through band-to-band data comparison for visible and near infrared radiometers which onboard earth observation satellites is conducted. The conventional cross calibration for visi...

A Review on Scream Classification for Situation Understanding

: In our living environment, a non-speech audio signal provides a significant evidence for situation awareness. It also compliments the information obtained from a video signal. In non-speech audio signals, screaming is...

Improving the Recognition of Heart Murmur

Diagnosis of congenital cardiac defects is challenging, with some being diagnosed during pregnancy while others are diagnosed after birth or later on during childhood. Prompt diagnosis allows early intervention and best...

Reducing the Calculations of Quality-Aware Web Services Composition Based on Parallel Skyline Service

The perfect composition of atomic services to provide users with services through applying qualitative parameters is very important. As expected, web services with similar features lead to competition among the service p...

Download PDF file
  • EP ID EP552373
  • DOI 10.14569/IJACSA.2019.0100469
  • Views 98
  • Downloads 0

How To Cite

Ayaz H. Khan, Ali Mustafa Qamar, Aneeq Yusuf, Rehanullah Khan (2019). Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics. International Journal of Advanced Computer Science & Applications, 10(4), 557-566. https://europub.co.uk/articles/-A-552373