Implementation of K-Means Clustering Algorithm in Hadoop Framework

Abstract

Drastic growth of digital data is an emerging area of concern which has led to concentration of Data Mining technique. The actual data mining task involves programmatic or semi-programmatic analysis of large quantities of data to extract hidden interesting patterns such as groups of data records, which is referred as Cluster Analysis. Clustering is the partitioning of data items into different groups (clusters), so that the data objects of each cluster share common characteristics. Data collected in practical scenarios is more often than not completely random and unstructured or semi-structured. Hence, there is always a need for analysis of such data sets to derive meaningful hidden information. In this kind of scenarios, the unsupervised algorithms come in to picture to process unstructured or even semi structured data sets by resultant. Several clustering algorithms have been proposed in the past few years among which k-means clustering algorithm is one of the simplest and popular unsupervised learning algorithm that will solve the well-known clustering problem. K-means clustering algorithm produces a specific number of disjoint clusters. The k-means algorithm requires k initial cluster centers that must be specified beforehand and are randomly selected. This paper discusses the implementation of K-means algorithm in MapReduce programing model which is run on Hadoop distributive environment.

Authors and Affiliations

Uday Kumar S, Naveen D Chandavarkar

Keywords

Related Articles

Implementation of Enhanced NOC Router

VLSI innovation has enhanced in incorporating many cores on a single chip, but association between them is critical. NoC has appeared to be solution for this. In this paper, a novel router which is main part of on-chip...

High-Stress Abrasive Wear Response of Zinc-Based Alloy: A Comparison with Grey Cast Iron

The objective of the present study is to assess the high-stress abrasive response of zinc-based alloy and to compare their properties with a conventionally used grey cast iron. The zinc-based alloy has been synthesized...

Implementation of an efficient low complexity method for wireless CE using a BEM for the wireless channel taps.

The matrix representation of the signal model of MIMO-OFDM systems, which clearly describes the relation of signals in frequency domain and time domain and expressing operations like adding CP and removing CP as matrix...

Design and Development of Semi-automatic dish washer and its comparison with Automatic dish washer

Washing dishes is most commonly done activity in the world, in most of families people wash dishes by hand which is straining to muscles and detergent is chemically harmful.. As far as manual process is concerned in hou...

A Comparative Study of Organic Light Emitting Diode with Other Displays

organic light emitting diode, that is, OLED is a light emitting diode in which emissive electroluminescent layer is thin film of organic compound which emit light in response to emit light. It is an emerging display tec...

Download PDF file
  • EP ID EP20614
  • DOI -
  • Views 320
  • Downloads 5

How To Cite

Uday Kumar S, Naveen D Chandavarkar (2015). Implementation of K-Means Clustering Algorithm in Hadoop Framework. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 3(5), -. https://europub.co.uk/articles/-A-20614