Data Distribution Aware Classification Algorithm based on K-Means
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2017, Vol 8, Issue 9
Abstract
Giving data driven decisions based on precise data analysis is widely required by different businesses. For this purpose many different data mining strategies exist. Nevertheless, existing strategies need attention by researchers so that they can be adapted to the modern data analysis needs. One of the popular algorithms is K-Means. This paper proposes a novel improvement to the classical K-Means classification algorithm. It is known that data characteristics like data distribution, high-dimensionality, the size, the sparseness of the data, etc. have a great impact on the success of the K-Means clustering, which directly affects the accuracy of classification. In this study, the K-Means algorithm was modified to remedy the algorithm’s classification accuracy degradation, which is observed when the data distribution is not suitable to be clustered by data centroids, where each centroid is represented by a single mean. Specifically, this paper proposes to intelligently include the effect of variance based on the detected data distribution nature of the data. To see the performance improvement of the proposed method, several experiments were carried out using different real datasets. The presented results, which are achieved after extensive experiments, prove that the proposed algorithm improves the classification accuracy of KMeans. The achieved performance was also compared against several recent classification studies which are based on different classification schemes.
Authors and Affiliations
Tamer Tulgar, Ali Haydar, Ibrahim Ersan
A Review and Classification of Widely used Offline Brain Datasets
Brain Computer Interfaces (BCI) are a natural extension to Human Computer Interaction (HCI) technologies. BCI is especially useful for people suffering from diseases, such as Amyotrophic Lateral Sclerosis (ALS) which cau...
An Improved Transformer for LLC Resonant Inverter for Induction Heating Applications
A new trend in power converters is to design a planar transformer that aims for low profile. However, at high frequency, the planar transformer AC losses become significant due to the proximity and skin effects. In this...
Improved Langley and Ratio Langley Methods for Improving Sky-Radiometer Accuracy
Improved Langley Method (ILM) is proposed to improve the calibration accuracy of the sky-radiometer. The ILM uses that the calibration coefficients of other arbitrary wavelengths can be presumed from the calibration coef...
Cadastral and Tea Production Management System with Wireless Sensor Network, GIS based System and IoT Technology
Cadastral and tea production management system utilizing wireless sensor network of Internet of Things (IoT) technology is proposed. To improve efficiency of tea productions, cadastral management and tea production proce...
The Computation of Assimilation of Arabic Language Phonemes
The computational phonology is fairly a new science that deals with studying phonological rules under the computation point of view. Computational phonology is based on the phonological rules, which are the processes tha...