Missing Data Imputation using Genetic Algorithm for Supervised Learning

Abstract

Data is an important asset for any organization to successfully run its business. When we collect data, it contains data with low qualities such as noise, incomplete, missing values etc. If the quality of data is low then mining results of any data mining algorithm will also below. In this paper, we propose a technique to deal with missing values. Genetic algorithm (GA) is used for the estimation of missing values in datasets. GA is introduced to generate optimal sets of missing values and information gain (IG) is used as the fitness function to measure the performance of an individual solution. Our goal is to impute missing values in a dataset for better classification results. This technique works even better when there is a higher rate of missing values or incomplete information along with a greater number of distinct values in attributes/features having missing values. We compare our proposed technique with single imputation techniques and multiple imputations (MI) statistically based approaches on various benchmark classification techniques on different performance measures. We show that our proposed methods outperform when compare with another state of the art missing data imputation techniques.

Authors and Affiliations

Waseem Shahzad, Qamar Rehman, Ejaz Ahmed

Keywords

Related Articles

The Implementation of an IoT-Based Flood Alert System

Floods are the most damaging natural disaster in this world. On the occasion of heavy flood, it can destroy the community and killed many lives. The government would spend billions of dollars to recover the affected area...

AdviseMe: An Intelligent Web-Based Application for Academic Advising

The traditional academic advising process in many tertiary-level institutions today possess significant inefficiencies, which often account for high levels of student dissatisfaction. Common issues include high student-a...

On Some Methods for Dimensionality Reduction of ECG Signals

Dimensionality reduction with two methods, namely, Laplacian Eigenmap (LE) and Locality Preserving Projections (LPP) is studied for normal and pathological noisy and noiseless ECG patterns. Besides, the possibility of us...

Emotion Detection in Text using Nested Long Short-Term Memory

Humans have the power to feel different types of emotions because human life is filled with many emotions. Human’s emotion can be reflected through reading or writing a text. In recent years, studies on emotion detection...

SEUs Mitigation on Program Counter of the LEON3 Soft Processor

Analyzing and evaluating the sensitivity of embedded systems to soft-errors have always been a challenge for aerospace or safety equipment designer. Different automated fault-injection methods have been developed for eva...

Download PDF file
  • EP ID EP251109
  • DOI 10.14569/IJACSA.2017.080360
  • Views 106
  • Downloads 0

How To Cite

Waseem Shahzad, Qamar Rehman, Ejaz Ahmed (2017). Missing Data Imputation using Genetic Algorithm for Supervised Learning. International Journal of Advanced Computer Science & Applications, 8(3), 438-445. https://europub.co.uk/articles/-A-251109