Development and Evaluation of a Parallel K-means Algorithm for Big Data Analysis in Google MapReduce Environment

Journal Title: International Journal of Knowledge and Innovation Studies - Year 2024, Vol 2, Issue 3

Abstract

The challenge of executing iterative big data analysis algorithms within the Google Cloud MapReduce environment has been addressed by developing a parallel K-means algorithm capable of leveraging the distributed computing power of the platform. Traditional K-means, which requires iterative steps, is adapted into a parallel version using MapReduce to enhance computational efficiency. This parallel algorithm is structured into multiple super-steps, each of which executes in parallel but is processed sequentially across super-steps. Each super-step corresponds to one iteration of the serial K-means algorithm, with parallel computation carried out at each node to determine the mean of each cluster center. Experimental evaluations have demonstrated that the parallel K-means algorithm performs effectively and accurately. Notably, for a dataset of 450 water samples, a parallel speedup factor of 20.8 was achieved, significantly reducing the time required for data analysis. This substantial reduction in processing time is critical in time-sensitive applications, such as coal mine rescue operations, where quick decision-making is essential. The results indicate that the proposed parallel K-means algorithm is both a feasible and efficient solution for handling large-scale datasets within cloud environments, providing substantial benefits in both computational speed and practical application.

Authors and Affiliations

Junwei Zhao, Xuexu Yuan, Qingtao Hou, Hanyu Gao, Chunyu Gao, Yuanyuan Zhang

Keywords

Related Articles

Enhanced Global Image Segmentation: Addressing Pixel Inhomogeneity and Noise with Average Convolution and Entropy-Based Local Factor

In the field of computer vision and digital image processing, the division of images into meaningful segments is a pivotal task. This paper introduces an innovative global image segmentation model, distinguished for its...

Understanding Self-Regulated Learning Dynamics Through Computer Simulation: A Model-Based Approach

Self-regulated learning (SRL) is conceptualized as a series of interrelated cognitive and affective processes rather than as isolated events. To elucidate the relationship between students' cognitive engagement and their...

Development and Evaluation of a Parallel K-means Algorithm for Big Data Analysis in Google MapReduce Environment

The challenge of executing iterative big data analysis algorithms within the Google Cloud MapReduce environment has been addressed by developing a parallel K-means algorithm capable of leveraging the distributed computin...

Intelligent Image Segmentation via Complex Pythagorean Fuzzy Sets and Level-Set Optimization

Image segmentation plays a crucial role in medical imaging, remote sensing, and object detection. However, challenges persist due to uncertainty in region classification, sensitivity to noise, and discontinuities in obje...

Integration of Fuzzy Inference Systems and Linear Regression for Enhanced Height Prediction of Deodar Cedar Trees in Kumrat Valley

Accurate estimation of tree height is fundamental to sustainable forest management, particularly in regions such as Kumrat Valley, Pakistan, where Deodar Cedar (Cedrus deodara) serves as a vital ecological and economic r...

Download PDF file
  • EP ID EP755051
  • DOI 10.56578/ijkis020303
  • Views 39
  • Downloads 0

How To Cite

Junwei Zhao, Xuexu Yuan, Qingtao Hou, Hanyu Gao, Chunyu Gao, Yuanyuan Zhang (2024). Development and Evaluation of a Parallel K-means Algorithm for Big Data Analysis in Google MapReduce Environment. International Journal of Knowledge and Innovation Studies, 2(3), -. https://europub.co.uk/articles/-A-755051