Big Data Management with Incremental K-Means Trees–GPU-Accelerated Construction and Visualization

Journal Title: Informatics - Year 2017, Vol 4, Issue 3

Abstract

While big data is revolutionizing scientific research, the tasks of data management and analytics are becoming more challenging than ever. One way to remit the difficulty is to obtain the multilevel hierarchy embedded in the data. Knowing the hierarchy enables not only the revelation of the nature of the data, it is also often the first step in big data analytics. However, current algorithms for learning the hierarchy are typically not scalable to large volumes of data with high dimensionality. To tackle this challenge, in this paper, we propose a new scalable approach for constructing the tree structure from data. Our method builds the tree in a bottom-up manner, with adapted incremental k-means. By referencing the distribution of point distances, one can flexibly control the height of the tree and the branching of each node. Dimension reduction is also conducted as a pre-process, to further boost the computing efficiency. The algorithm takes a parallel design and is implemented with CUDA (Compute Unified Device Architecture), so that it can be efficiently applied to big data. We test the algorithm with two real-world datasets, and the results are visualized with extended circular dendrograms and other visualization techniques.

Authors and Affiliations

Jun Wang, Alla Zelenyuk, Dan Imre and Klaus Mueller

Keywords

Related Articles

Recognition of Physical Activities from a Single Arm-Worn Accelerometer: A Multiway Approach

In current clinical practice, functional limitations due to chronic musculoskeletal diseases are still being assessed subjectively, e.g., using questionnaires and function scores. Performance-based methods, on the othe...

Using Malone’s Theoretical Model on Gamification for Designing Educational Rubrics

How could a structured proposal for an evaluation rubric benefit from assessing and including the organizational variables used when one of the first definitions of gamification related to game theory was established b...

The Effect of Evidence Transfer on Latent Feature Relevance for Clustering

Evidence transfer for clustering is a deep learning method that manipulates the latent representations of an autoencoder according to external categorical evidence with the effect of improving a clustering outcome. Evi...

Multidimensional Data Exploration by Explicitly Controlled Animation

Understanding large multidimensional datasets is one of the most challenging problems in visual data exploration. One key challenge that increases the size of the exploration space is the number of views that one can gen...

Thinking Informatically

On being promoted to a personal chair in 1993 I chose the title of Professor of Informatics, specifically acknowledging Donna Haraway’s definition of the term as the “technologies of information [and communication] as...

Download PDF file
  • EP ID EP44093
  • DOI https://doi.org/10.3390/informatics4030024
  • Views 218
  • Downloads 0

How To Cite

Jun Wang, Alla Zelenyuk, Dan Imre and Klaus Mueller (2017). Big Data Management with Incremental K-Means Trees–GPU-Accelerated Construction and Visualization. Informatics, 4(3), -. https://europub.co.uk/articles/-A-44093