Big Data Management with Incremental K-Means Trees–GPU-Accelerated Construction and Visualization

Apply

Big Data Management with Incremental K-Means Trees–GPU-Accelerated Construction and Visualization

Journal Title: Informatics - Year 2017, Vol 4, Issue 3

Abstract

While big data is revolutionizing scientific research, the tasks of data management and analytics are becoming more challenging than ever. One way to remit the difficulty is to obtain the multilevel hierarchy embedded in the data. Knowing the hierarchy enables not only the revelation of the nature of the data, it is also often the first step in big data analytics. However, current algorithms for learning the hierarchy are typically not scalable to large volumes of data with high dimensionality. To tackle this challenge, in this paper, we propose a new scalable approach for constructing the tree structure from data. Our method builds the tree in a bottom-up manner, with adapted incremental k-means. By referencing the distribution of point distances, one can flexibly control the height of the tree and the branching of each node. Dimension reduction is also conducted as a pre-process, to further boost the computing efficiency. The algorithm takes a parallel design and is implemented with CUDA (Compute Unified Device Architecture), so that it can be efficiently applied to big data. We test the algorithm with two real-world datasets, and the results are visualized with extended circular dendrograms and other visualization techniques.

Authors and Affiliations

Jun Wang, Alla Zelenyuk, Dan Imre and Klaus Mueller

Keywords

data management; hierarchy construction; parallel computing; visualization

Motivation and User Engagement in Fitness Tracking: Heuristics for Mobile Healthcare Wearables

Wearable fitness trackers have gained a new level of popularity due to their ambient data gathering and analysis. This has signalled a trend toward self-efficacy and increased motivation among users of these devices. F...

An Empirical Study on Importance of Modeling Parameters and Trading Volume-Based Features in Daily Stock Trading Using Neural Networks

There have been many machine learning-based studies to forecast stock price trends. These studies attempted to extract input features mostly from the price information with little focus on the trading volume informatio...

EP ID EP44093
DOI https://doi.org/10.3390/informatics4030024
Views 266
Downloads 0