Missing Data Imputation using Genetic Algorithm for Supervised Learning
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2017, Vol 8, Issue 3
Abstract
Data is an important asset for any organization to successfully run its business. When we collect data, it contains data with low qualities such as noise, incomplete, missing values etc. If the quality of data is low then mining results of any data mining algorithm will also below. In this paper, we propose a technique to deal with missing values. Genetic algorithm (GA) is used for the estimation of missing values in datasets. GA is introduced to generate optimal sets of missing values and information gain (IG) is used as the fitness function to measure the performance of an individual solution. Our goal is to impute missing values in a dataset for better classification results. This technique works even better when there is a higher rate of missing values or incomplete information along with a greater number of distinct values in attributes/features having missing values. We compare our proposed technique with single imputation techniques and multiple imputations (MI) statistically based approaches on various benchmark classification techniques on different performance measures. We show that our proposed methods outperform when compare with another state of the art missing data imputation techniques.
Authors and Affiliations
Waseem Shahzad, Qamar Rehman, Ejaz Ahmed
Hybrid intelligent system for Sale Forecasting using Delphi and adaptive Fuzzy Back-Propagation Neural Networks
Sales forecasting is one of the most crucial issues addressed in business. Control and evaluation of future sales still seem concerned both researchers and policy makers and managers of companies. this research propose a...
Performance Evaluation of Trivium on Raspberry Pi
High connectivity of billions of IoT devices lead to many security issues. Trivium is designed for IoT to overcome the security challenges of IoT. The objective of this study is to implement a security service to provide...
Morphological Features Analysis for Erythrocyte Classification in IDA and Thalassemia
Iron Deficiency Anemia (IDA) and Thalassemia is a common disease in the world population. In hospital routine, those diseases are being recognized based on level of hemoglobin in Complete Blood Count (CBC) result. Then,...
Emotion Classification Using Facial Expression
Human emotional facial expressions play an important role in interpersonal relations. This is because humans demonstrate and convey a lot of evident information visually rather than verbally. Although humans recognize fa...
Role of Secondary Attributes to Boost the Prediction Accuracy of Students’ Employability Via Data Mining
Data Mining is best-known for its analytical and prediction capabilities. It is used in several areas such as fraud detection, predicting client behavior, money market behavior, bankruptcy prediction. It can also help in...