Processing Sampled Big Data

Abstract

Big data processing requires extremely powerful and large computing setup. This puts bottleneck not only on processing infrastructure but also many researchers don’t get the freedom to analyze large datasets. This paper thus analyzes the processing of the large amount of data from machine learnt models that are built on the smaller sets of data samples. This work analyzes more than 40 GB data by testing different strategies of reducing the processed data without losing and compromising on the detection and model learning in machine learning. Many alternatives are analyzed and it is observed that 50% reduction does not drastically harm the machine learning model performance. On average, in SVM only 3.6%, and in Random Forest, only 1.8% performance is reduced, if only 50% data is used. The 50% reduction in instances means that in most cases, the data will fit in the RAM and the processing times will be considerably reduced, benefitting in execution times and or resources. From the incremental training and testing experiments, it is found that in special cases, smaller sub-sampled data can be used for model generation in machine learning problems. This is useful in cases, where there are either limitations on hardware or one has to select among many available machine learning algorithms.

Authors and Affiliations

Waleed Albattah, Rehan Ullah Khan

Keywords

Related Articles

Security in OpenFlow Enabled Cloud Environment

Inception of flow tables as data plane abstraction, and forwarding rules that are managed by centralized controllers in emerging Software Defined Networks (SDN) has stemmed significant progress in OpenFlow based architec...

Student’s Opinions on Online Educational Games for Learning Programming Introductory

Use of educational games is an approach that has potential to change the existing educational method. This is due to games popularity among younger generation as well as engagement and fun features of games compared to c...

Image Compression Techniques Using Modified high quality Multi wavelets

Over the past decade, the success of wavelets in solving many different problems has contributed to its unprecedented popularity. For best performance in image compression, wavelet transforms require filters that combine...

 Simultaneous Estimation of Geophysical Parameters with Microwave Radiometer Data based on Accelerated Simulated Annealing: SA

 Method for geophysical parameter estimations with microwave radiometer data based on Simulated Annealing: SA is proposed. Geophysical parameters which are estimated with microwave radiometer data are closely relate...

Conceptual Model for WWBAN (Wearable Wireless Body Area Network)

Modern world advances in sensors miniaturization and wireless networking which enables exploiting wireless sensor networking to monitor and control the environment. Human health monitoring is promising applications of se...

Download PDF file
  • EP ID EP376017
  • DOI 10.14569/IJACSA.2018.090846
  • Views 100
  • Downloads 0

How To Cite

Waleed Albattah, Rehan Ullah Khan (2018). Processing Sampled Big Data. International Journal of Advanced Computer Science & Applications, 9(8), 350-356. https://europub.co.uk/articles/-A-376017