Big Data Management with Incremental K-Means Trees–GPU-Accelerated Construction and Visualization

Journal Title: Informatics - Year 2017, Vol 4, Issue 3

Abstract

While big data is revolutionizing scientific research, the tasks of data management and analytics are becoming more challenging than ever. One way to remit the difficulty is to obtain the multilevel hierarchy embedded in the data. Knowing the hierarchy enables not only the revelation of the nature of the data, it is also often the first step in big data analytics. However, current algorithms for learning the hierarchy are typically not scalable to large volumes of data with high dimensionality. To tackle this challenge, in this paper, we propose a new scalable approach for constructing the tree structure from data. Our method builds the tree in a bottom-up manner, with adapted incremental k-means. By referencing the distribution of point distances, one can flexibly control the height of the tree and the branching of each node. Dimension reduction is also conducted as a pre-process, to further boost the computing efficiency. The algorithm takes a parallel design and is implemented with CUDA (Compute Unified Device Architecture), so that it can be efficiently applied to big data. We test the algorithm with two real-world datasets, and the results are visualized with extended circular dendrograms and other visualization techniques.

Authors and Affiliations

Jun Wang, Alla Zelenyuk, Dan Imre and Klaus Mueller

Keywords

Related Articles

Motivation and User Engagement in Fitness Tracking: Heuristics for Mobile Healthcare Wearables

Wearable fitness trackers have gained a new level of popularity due to their ambient data gathering and analysis. This has signalled a trend toward self-efficacy and increased motivation among users of these devices. F...

An Empirical Study on Importance of Modeling Parameters and Trading Volume-Based Features in Daily Stock Trading Using Neural Networks

There have been many machine learning-based studies to forecast stock price trends. These studies attempted to extract input features mostly from the price information with little focus on the trading volume informatio...

AVIST: A GPU-Centric Design for Visual Exploration of Large Multidimensional Datasets

This paper presents the Animated VISualization Tool (AVIST), an exploration-oriented data visualization tool that enables rapidly exploring and filtering large time series multidimensional datasets. AVIST highlights in...

Improving the Classification Efficiency of an ANN Utilizing a New Training Methodology

In this work, a new approach for training artificial neural networks is presented which utilises techniques for solving the constraint optimisation problem. More specifically, this study converts the training of a neur...

Modeling Analytical Streams for Social Business Intelligence

Social Business Intelligence (SBI) enables companies to capture strategic information from public social networks. Contrary to traditional Business Intelligence (BI), SBI has to face the high dynamicity of both the soc...

Download PDF file
  • EP ID EP44093
  • DOI https://doi.org/10.3390/informatics4030024
  • Views 266
  • Downloads 0

How To Cite

Jun Wang, Alla Zelenyuk, Dan Imre and Klaus Mueller (2017). Big Data Management with Incremental K-Means Trees–GPU-Accelerated Construction and Visualization. Informatics, 4(3), -. https://europub.co.uk/articles/-A-44093