Efficient Way of Determining the Number of Clusters Using Hadoop Architecture

Journal Title: UNKNOWN - Year 2015, Vol 4, Issue 2

Abstract

The process of data mining is to extract information from a data set and transform it into an understandable structure. The clustering task plays a very important role in many areas such as exploratory data analysis, pattern recognition, computer vision, and information retrieval. The key idea is to view clustering as a supervised classification problem, in which we estimate the “true” class labels. The problem of determining the valid number of clusters is not easy. To overcome this problem many well known methods are used to find a correct number of clusters i.e. Gap statistic, Path based clustering and Figure of Merit (FOM) but these methods could not solve the problem of finding number of clusters efficiently. This paper focuses on “Average Intracluster Distance” index to validate the estimated number of arbitrary shaped clusters. In hadoop the proposed technique is based on the local relations between patterns and their clustering labels which makes use of Minimum Spanning Tree (MST) algorithm based on the multiplicity property of MST to get accurate results in efficient manner .

Authors and Affiliations

Keywords

Related Articles

A New Architecture of High Performance WG Stream Cipher

Cipher is an algorithm for transforming the message. Stream ciphers are light weight symmetric key cryptosystems. These ciphers encrypt a plain-text or decrypt a cipher-text by adding the plain-text or cipher-text bit by...

A Review on use of Computational Fluid Dynamics in Gas Turbine Combustor Analysis and its Scope

Computational fluid dynamics (CFD) modeling is now widely applied as combustion optimization tool. The steady increase in computer power over recent years has enabled combustion engineers to model reacting multi-phase fl...

A Survey on Encryption Methods for Providing Security in Pub/Sub System

Internet has changed the world of distributed computing significantly. Peer-to-peer communication mechanism making system more rigid and static applications in distributed system, making a way to loosely coupled infrastr...

A Pilot Study on the Assessment of Nutritional Status In The School Going Children (6-11 Years) In Rural Areas of Coonoor, Nilgiris District

Malnutrition is India-s silent emergency and among India-s greatest human development challenges. The crisis of malnutrition is real and its persistence has profound and frightening implications for children, society and...

Dual Security Using Dual Encryption Schemes and Efficient User Revocation in Cloud

Cloud computing in the domain of distributed systems introduces many challenges in the day-to-day life. One of the main challenges is data security and privacy. Security on cloud data can be enhanced using dual encryptio...

Download PDF file
  • EP ID EP355025
  • DOI -
  • Views 89
  • Downloads 0

How To Cite

(2015). Efficient Way of Determining the Number of Clusters Using Hadoop Architecture. UNKNOWN, 4(2), -. https://europub.co.uk/articles/-A-355025