Efficient Way of Determining the Number of Clusters Using Hadoop Architecture

Journal Title: International Journal of Science and Research (IJSR) - Year 2015, Vol 4, Issue 2

Abstract

The process of data mining is to extract information from a data set and transform it into an understandable structure. The clustering task plays a very important role in many areas such as exploratory data analysis, pattern recognition, computer vision, and information retrieval. The key idea is to view clustering as a supervised classification problem, in which we estimate the “true” class labels. The problem of determining the valid number of clusters is not easy. To overcome this problem many well known methods are used to find a correct number of clusters i.e. Gap statistic, Path based clustering and Figure of Merit (FOM) but these methods could not solve the problem of finding number of clusters efficiently. This paper focuses on “Average Intracluster Distance” index to validate the estimated number of arbitrary shaped clusters. In hadoop the proposed technique is based on the local relations between patterns and their clustering labels which makes use of Minimum Spanning Tree (MST) algorithm based on the multiplicity property of MST to get accurate results in efficient manner .

Authors and Affiliations

Keywords

Related Articles

A Study to Evaluate the Effectiveness of Health Education Programme on Management of Hypertension for Hypertensive Clients in Terms of Knowledge, Compliance and Life Style in Khan Ahmedpur Village of District Ambala, Haryana

An experiment study was conducted to evaluate the effectiveness of health education programme on management of hypertension for hypertensive client in terms of knowledge, compliance and life style in Khan Ahmedpur villag...

Supplementation of Lactic Acid and Citric Acid in Diets Replacing Antibiotic and its Influence on Broiler Performance, Meat Yield and Immune Response up to 42 Days of Age

The present study aimed at evaluating two organic acids, Lactic acid (LA) and Citric acid (CA), each of which at 1.0 and 2.0% levels for replacing antibiotic (AB-Virginiamycin 11mg/kg) from diets. In a feeding trial with...

Cyber Disorder

Among a small but growing body of research, the term addiction has extended into the psychiatric lexicon to identify problematic Internet use associated with significant social, psychological, and occupational impairment

Comparison of Phase Only Correlation and Neural Network for Iris Recognition

This paper compares two different techniques of iris recognition and explains the steps of extracting iris from eye image palette formation and conditioning of palette for matching. The focus of the paper is in finding t...

Performance of Glasswool and Cyclopaintain In Domestic Refrigeration

Performance of Glasswool and Cyclopaintain In Domestic Refrigeration

Download PDF file
  • EP ID EP355025
  • DOI -
  • Views 75
  • Downloads 0

How To Cite

(2015). Efficient Way of Determining the Number of Clusters Using Hadoop Architecture. International Journal of Science and Research (IJSR), 4(2), -. https://europub.co.uk/articles/-A-355025