Implementation of K-Means Clustering Algorithm in Hadoop Framework

Apply

Implementation of K-Means Clustering Algorithm in Hadoop Framework

Journal Title: International Journal for Research in Applied Science and Engineering Technology (IJRASET) - Year 2015, Vol 3, Issue 5

Abstract

Drastic growth of digital data is an emerging area of concern which has led to concentration of Data Mining technique. The actual data mining task involves programmatic or semi-programmatic analysis of large quantities of data to extract hidden interesting patterns such as groups of data records, which is referred as Cluster Analysis. Clustering is the partitioning of data items into different groups (clusters), so that the data objects of each cluster share common characteristics. Data collected in practical scenarios is more often than not completely random and unstructured or semi-structured. Hence, there is always a need for analysis of such data sets to derive meaningful hidden information. In this kind of scenarios, the unsupervised algorithms come in to picture to process unstructured or even semi structured data sets by resultant. Several clustering algorithms have been proposed in the past few years among which k-means clustering algorithm is one of the simplest and popular unsupervised learning algorithm that will solve the well-known clustering problem. K-means clustering algorithm produces a specific number of disjoint clusters. The k-means algorithm requires k initial cluster centers that must be specified beforehand and are randomly selected. This paper discusses the implementation of K-means algorithm in MapReduce programing model which is run on Hadoop distributive environment.

Authors and Affiliations

Uday Kumar S, Naveen D Chandavarkar

Keywords

EP ID EP20614
DOI -
Views 320
Downloads 5