Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics

Abstract

The goal of big data analytics is to analyze datasets with a higher amount of volume, velocity, and variety for large-scale business intelligence problems. These workloads are normally processed with the distribution on massively parallel analytical systems. Deep learning is part of a broader family of machine learning methods based on learning representations of data. Deep learning plays a significant role in the information analysis by adding value to the massive amount of unsupervised data. A core domain of research is related to the development of deep learning algorithms for auto-extraction of complex data formats at a higher level of abstraction using the massive volumes of data. In this paper, we present the latest research trends in the development of parallel algorithms, optimization techniques, tools and libraries related to big data analytics and deep learning on various parallel architectures. The basic building blocks for deep learning such as Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) are identified and analyzed for parallelization of deep learning models. We proposed a parallel software API based on PyTorch, Hadoop Distributed File System (HDFS), Apache Hadoop MapReduce and MapReduce Job (MRJob) for developing large-scale deep learning models. We obtained about 5-30% reduction in the execution time of the deep auto-encoder model even on a single node Hadoop cluster. Furthermore, the complexity of code development is significantly reduced to create multi-layer deep learning models.

Authors and Affiliations

Ayaz H. Khan, Ali Mustafa Qamar, Aneeq Yusuf, Rehanullah Khan

Keywords

Related Articles

MMO: Multiply-Minus-One Rule for Detecting & Ranking Positive and Negative Opinion

Hit and hot issue about reviews of any product is sentiment classification. Not only manufacturing company of the reviewed product takes decision about its quality, but the customers’ purchase of the product is also base...

LSSCW: A Lightweight Security Scheme for Cluster based Wireless Sensor Network

In last two decades, Wireless Sensor Network (WSN) is used for large number of Internet of Things (IoT) applications, such as military surveillance, forest fire detection, healthcare, precision agriculture and smart home...

Hyperspectral Image Classification using Support Vector Machine with Guided Image Filter

Hyperspectral images are used to identify and detect the objects on the earth’s surface. Classifying of these hyperspectral images is becoming a difficult task, due to more number of spectral bands. These high dimensiona...

Multi-input Multi-output Beta Wavelet Network: Modeling of Acoustic Units for Speech Recognition

In this paper, we propose a novel architecture of wavelet network called Multi-input Multi-output Wavelet Network MIMOWN as a generalization of the old architecture of wavelet network. This newel prototype was applied to...

Reducing Dimensionality in Text Mining using Conjugate Gradients and Hybrid Cholesky Decomposition

Generally, data mining in larger datasets consists of certain limitations in identifying the relevant datasets for the given queries. The limitations include: lack of interaction in the required objective space, inabilit...

Download PDF file
  • EP ID EP552373
  • DOI 10.14569/IJACSA.2019.0100469
  • Views 91
  • Downloads 0

How To Cite

Ayaz H. Khan, Ali Mustafa Qamar, Aneeq Yusuf, Rehanullah Khan (2019). Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics. International Journal of Advanced Computer Science & Applications, 10(4), 557-566. https://europub.co.uk/articles/-A-552373