Performance Analysis of Machine Learning Algorithms for Missing Value Imputation

Abstract

Data mining requires a pre-processing task in which the data are prepared, cleaned, integrated, transformed, reduced and discretized for ensuring the quality. Missing values is a universal problem in many research domains that is commonly encountered in the data cleaning process. Missing values usually occur when a value of stored data absent for a variable of an observation. Missing values problem imposes undesirable effect on analysis results, especially when it leads to biased parameter estimates. Data imputation is a common way to deal with missing values where the missing value’s substitutes are discovered through statistical or machine learning techniques. Nevertheless, examining the strengths (and limitations) of these techniques is important to aid understanding its characteristics. In this paper, the performance of three machine learning classifiers (K-Nearest Neighbors (KNN), Decision Tree, and Bayesian Networks) are compared in terms of data imputation accuracy. The results shows that among the three classifiers, Bayesian has the most promising performance.

Authors and Affiliations

Nadzurah Zainal Abidin, Amelia Ritahani Ismail, Nurul A. Emran

Keywords

Related Articles

 Wavelet Time-frequency Analysis of Electro-encephalogram (EEG) Processing

 This paper proposes time-frequency analysis of EEG spectrum and wavelet analysis in EEG de-noising. In this paper, the basic idea is to use the characteristics of multi-scale multi-resolution, using four different...

Active and Reactive Power Control of a Variable Speed Wind Energy Conversion System based on Cage Generator

This manuscript presents the modeling and control design for a variable speed wind energy conversion system (VS-WECS). This control scheme is based on three-phase squirrel cage induction generator driven by a horizontal-...

Repository System for Geospatial Software Development and Integration

The integration of geospatial software components has recently received considerable attention due to the need for rapid growth of GIS application and development environments. However, finding appropriate source code co...

Convex Hybrid Restoration and Segmentation Model for Color Images

Image restoration and segmentation are important areas in digital image processing and computer vision. In this paper, a new convex hybrid model is proposed for joint restoration and segmentation during the post-processi...

Online Monitoring System Design of Intelligent Circuit Breaker Based on DSP and ARM

In order to accurately analyze the dynamic characteristics of the vacuum circuit breaker, a dual-core master-slave processor structure for online monitoring system based on DSP and ARM is proposed. This structure consist...

Download PDF file
  • EP ID EP324939
  • DOI 10.14569/IJACSA.2018.090660
  • Views 108
  • Downloads 0

How To Cite

Nadzurah Zainal Abidin, Amelia Ritahani Ismail, Nurul A. Emran (2018). Performance Analysis of Machine Learning Algorithms for Missing Value Imputation. International Journal of Advanced Computer Science & Applications, 9(6), 442-447. https://europub.co.uk/articles/-A-324939