Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics

Abstract

The goal of big data analytics is to analyze datasets with a higher amount of volume, velocity, and variety for large-scale business intelligence problems. These workloads are normally processed with the distribution on massively parallel analytical systems. Deep learning is part of a broader family of machine learning methods based on learning representations of data. Deep learning plays a significant role in the information analysis by adding value to the massive amount of unsupervised data. A core domain of research is related to the development of deep learning algorithms for auto-extraction of complex data formats at a higher level of abstraction using the massive volumes of data. In this paper, we present the latest research trends in the development of parallel algorithms, optimization techniques, tools and libraries related to big data analytics and deep learning on various parallel architectures. The basic building blocks for deep learning such as Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) are identified and analyzed for parallelization of deep learning models. We proposed a parallel software API based on PyTorch, Hadoop Distributed File System (HDFS), Apache Hadoop MapReduce and MapReduce Job (MRJob) for developing large-scale deep learning models. We obtained about 5-30% reduction in the execution time of the deep auto-encoder model even on a single node Hadoop cluster. Furthermore, the complexity of code development is significantly reduced to create multi-layer deep learning models.

Authors and Affiliations

Ayaz H. Khan, Ali Mustafa Qamar, Aneeq Yusuf, Rehanullah Khan

Keywords

Related Articles

Word Sense Disambiguation Approach for Arabic Text

Word Sense Disambiguation (WSD) consists of identifying the correct sense of an ambiguous word occurring in a given context. Most of Arabic WSD systems are based generally on the information extracted from the local cont...

Divide and Conquer Approach for Solving Security and Usability Conflict in User Authentication

Knowledge based authentication schemes are divided into textual password schemes and graphical password schemes. Textual password schemes are easy to use but have well known security issues, such as weak against online s...

NF-SAVO: Neuro-Fuzzy system for Arabic Video OCR

In this paper we propose a robust approach for text extraction and recognition from video clips which is called Neuro-Fuzzy system for Arabic Video OCR. In Arabic video text recognition, a number of noise components prov...

Handwritten Digit Recognition based on Output-Independent Multi-Layer Perceptrons

With handwritten digit recognition being an established and significant problem that is facing computer vision and pattern recognition, there has been a great deal of research work that has been undertaken in this area....

Smart Sustainable Agriculture (SSA) Solution Underpinned by Internet of Things (IoT) and Artificial Intelligence (AI)

The Internet of Things (IoT) and Artificial Intelligence (AI) have been employed in agriculture over a long period of time, alongside other advanced computing technologies. However, increased attention is currently being...

Download PDF file
  • EP ID EP552373
  • DOI 10.14569/IJACSA.2019.0100469
  • Views 114
  • Downloads 0

How To Cite

Ayaz H. Khan, Ali Mustafa Qamar, Aneeq Yusuf, Rehanullah Khan (2019). Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics. International Journal of Advanced Computer Science & Applications, 10(4), 557-566. https://europub.co.uk/articles/-A-552373