Missing Data Imputation using Genetic Algorithm for Supervised Learning

Abstract

Data is an important asset for any organization to successfully run its business. When we collect data, it contains data with low qualities such as noise, incomplete, missing values etc. If the quality of data is low then mining results of any data mining algorithm will also below. In this paper, we propose a technique to deal with missing values. Genetic algorithm (GA) is used for the estimation of missing values in datasets. GA is introduced to generate optimal sets of missing values and information gain (IG) is used as the fitness function to measure the performance of an individual solution. Our goal is to impute missing values in a dataset for better classification results. This technique works even better when there is a higher rate of missing values or incomplete information along with a greater number of distinct values in attributes/features having missing values. We compare our proposed technique with single imputation techniques and multiple imputations (MI) statistically based approaches on various benchmark classification techniques on different performance measures. We show that our proposed methods outperform when compare with another state of the art missing data imputation techniques.

Authors and Affiliations

Waseem Shahzad, Qamar Rehman, Ejaz Ahmed

Keywords

Related Articles

SDME Quality Measure based Stopping Criteria for Iterative Deblurring Algorithms

Deblurring from motion problem with or without noise is ill-posed inverse problem and almost all inverse problem require some sort of parameter selection. Quality of restored image in iterative motion deblurring is depen...

Automation of Combinatorial Interaction Test (CIT) Case Generation and Execution for Requirements based Testing (RBT) of Complex Avionics Systems

In the field of avionics, most of the software systems are either safety critical or mission critical. These systems are developed with high quality standards strictly following the relevant guidelines and procedures. Du...

 Analysis and Selection of Features for Gesture Recognition Based on a Micro Wearable Device

  More and More researchers concerned about designing a health supporting system for elders that is light weight, no disturbing to user, and low computing complexity. In the paper, we introduced a micro wearable dev...

Database Preservation: The DBPreserve Approach

In many institutions relational databases are used as a tool for managing information related to day to day activities. Institutions may be required to keep the information stored in relational databases accessible becau...

Assessment of Technology Transfer from Grid power to Photovoltaic: An Experimental Case Study for Pakistan

Pakistan is located on the world map where enough solar irradiance value strikes the ground that can be harnessed to vanish the existing blackout problems of the country. Government is focusing towards renewable integrat...

Download PDF file
  • EP ID EP251109
  • DOI 10.14569/IJACSA.2017.080360
  • Views 104
  • Downloads 0

How To Cite

Waseem Shahzad, Qamar Rehman, Ejaz Ahmed (2017). Missing Data Imputation using Genetic Algorithm for Supervised Learning. International Journal of Advanced Computer Science & Applications, 8(3), 438-445. https://europub.co.uk/articles/-A-251109