Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics

Abstract

The goal of big data analytics is to analyze datasets with a higher amount of volume, velocity, and variety for large-scale business intelligence problems. These workloads are normally processed with the distribution on massively parallel analytical systems. Deep learning is part of a broader family of machine learning methods based on learning representations of data. Deep learning plays a significant role in the information analysis by adding value to the massive amount of unsupervised data. A core domain of research is related to the development of deep learning algorithms for auto-extraction of complex data formats at a higher level of abstraction using the massive volumes of data. In this paper, we present the latest research trends in the development of parallel algorithms, optimization techniques, tools and libraries related to big data analytics and deep learning on various parallel architectures. The basic building blocks for deep learning such as Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) are identified and analyzed for parallelization of deep learning models. We proposed a parallel software API based on PyTorch, Hadoop Distributed File System (HDFS), Apache Hadoop MapReduce and MapReduce Job (MRJob) for developing large-scale deep learning models. We obtained about 5-30% reduction in the execution time of the deep auto-encoder model even on a single node Hadoop cluster. Furthermore, the complexity of code development is significantly reduced to create multi-layer deep learning models.

Authors and Affiliations

Ayaz H. Khan, Ali Mustafa Qamar, Aneeq Yusuf, Rehanullah Khan

Keywords

Related Articles

Minimizing Load Shedding in Electricity Networks using the Primary, Secondary Control and the Phase Electrical Distance between Generator and Loads

This paper proposes a method for determining location and calculating the minimum amount of power load needed to shed in order to recover the frequency back to the allowable range. Based on the consideration of the prima...

A Novel Image Encryption Supported by Compression Using Multilevel Wavelet Transform

In this paper we propose a novel approach for image encryption supported by lossy compression using multilevel wavelet transform. We first decompose the input image using multilevel 2-D wavelet transform, and thresholdin...

Repository of Static and Dynamic Signs

Gesture-based communication is on the rise in Human Computer Interaction. Advancement in the form of smart phones has made it possible to introduce a new kind of communication. Gesture-based interfaces are increasingly g...

The Effect of Diversity Implementation on Precision in Multicriteria Collaborative Filtering

This research was triggered by the criticism on the emergence of homogeneity in recommendation within the collaborative filtering based recommender systems that put similarity as the main principle in the algorithm. To o...

University Notification Subscription System using Amazon Web Service

Publish-Subscribe (Pub-Sub) system is an asynchronous communication service widely used in server-less and micro-services architecture. In a Pub-Sub system, publisher publish message to a topic that is immediately receiv...

Download PDF file
  • EP ID EP552373
  • DOI 10.14569/IJACSA.2019.0100469
  • Views 83
  • Downloads 0

How To Cite

Ayaz H. Khan, Ali Mustafa Qamar, Aneeq Yusuf, Rehanullah Khan (2019). Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics. International Journal of Advanced Computer Science & Applications, 10(4), 557-566. https://europub.co.uk/articles/-A-552373