Missing Data Imputation using Genetic Algorithm for Supervised Learning
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2017, Vol 8, Issue 3
Abstract
Data is an important asset for any organization to successfully run its business. When we collect data, it contains data with low qualities such as noise, incomplete, missing values etc. If the quality of data is low then mining results of any data mining algorithm will also below. In this paper, we propose a technique to deal with missing values. Genetic algorithm (GA) is used for the estimation of missing values in datasets. GA is introduced to generate optimal sets of missing values and information gain (IG) is used as the fitness function to measure the performance of an individual solution. Our goal is to impute missing values in a dataset for better classification results. This technique works even better when there is a higher rate of missing values or incomplete information along with a greater number of distinct values in attributes/features having missing values. We compare our proposed technique with single imputation techniques and multiple imputations (MI) statistically based approaches on various benchmark classification techniques on different performance measures. We show that our proposed methods outperform when compare with another state of the art missing data imputation techniques.
Authors and Affiliations
Waseem Shahzad, Qamar Rehman, Ejaz Ahmed
A Novel Adaptive Grey Verhulst Model for Network Security Situation Prediction
Recently, researchers have shown an increased interest in predicting the situation of incoming security situation for organization’s network. Many prediction models have been produced for this purpose, but many of these...
Applying Social Network Analysis to Analyze a Web-Based Community
This paper deals with a very renowned website (that is Book-Crossing) from two angles: The first angle focuses on the direct relations between users and books. Many things can be inferred from this part of analysis such...
A Review and Proof of Concept for Phishing Scam Detection and Response using Apoptosis
Phishing scam is a well-known fraudulent activity in which victims are tricked to reveal their confidential information especially those related to financial information. There are various phishing schemes such as decept...
Face Recognition System Based on Different Artificial Neural Networks Models and Training Algorithms
Face recognition is one of the biometric methods that is used to identify any given face image using the main features of this face. In this research, a face recognition system was suggested based on four Artificial Neur...
Anonymous Broadcast Messages
The Dining Cryptographer network (or DC-net) is a privacy preserving communication protocol devised by David Chaum for anonymous message publication. A very attractive feature of DC-nets is the strength of its security,...