Comparative Study of Three Imputation Methods to Treat Missing Values
Journal Title: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY - Year 2013, Vol 11, Issue 7
Abstract
One relevant problem in data preprocessing is the presence of missing data that leads the poor quality of patterns, extracted after mining. Imputation is one of the widely used procedures that replace the missing values in a data set by some probable values. The advantage of this approach is that the missing data treatment is independent of the learning algorithm used. This allows the user to select the most suitable imputation method for each situation. This paper analyzes the various imputation methods proposed in the field of statistics with respect to data mining. A comparative analysis of three different imputation approaches which can be used to impute missing attribute values in data mining are given that shows the most promising method. An artificial input data (of numeric type) file of 1000 records is used to investigate the performance of these methods. For testing the significance of these methods Z-test approach were used.
Authors and Affiliations
Rahul Singhai
Fuzzy Mean Point Clustering using K-means algorithm for implementing the movecentroid function code
The paper focus on combination of K-Means algorithm for Fuzzy Mean Point Clustering Neural Network (FMPCNN). The algorithm is implemented in JAVA program code for implementing the movecentroid function code into FMPCNN.Â...
Simulation And Auditing Of Network Security Based On Probabilistic Neural Network Approach
: Probabilistic Neural Network approach used for mobile adhoc network is more efficient way to estimate the network security. In this paper, we are using an Adhoc On Demand Distance Vector (AODV) protocol based mobile ad...
Novel Approach for Frequent Pattern Algorithm for Maximizing Frequent Patterns in Effective Time
The essential aspect of mining association rules is to mine the frequent patterns. Due to native difficulty it is impossible to mine complete frequent patterns from a dense database. FP-growth algorithm has been implemen...
Exploiting Flaws in Big Data Systems
In this journal we discuss the relevant security threats, vulnerabilities and prominent techniques to securing data lake. Over the last few years, Big Data solutions to data processing have made remarkable strides in the...
BER performance analysis of Subtractive interference cancellation (SIC) Using Rayleigh channel in DS-CDMA Receiver system
In this paper, we present and analyze the performance of a parallel interference cancellation (PIC) scheme for multicarrier (MC) direct-sequence code-division multiple-access (DS-CDMA) systems. In order to mitigate the m...