Selective Wander Join: Fast Progressive Visualizations for Data Joins

Journal Title: Informatics - Year 2019, Vol 6, Issue 1

Abstract

Progressive visualization offers a great deal of promise for big data visualization; however, current progressive visualization systems do not allow for continuous interaction. What if users want to see more confident results on a subset of the visualization? This can happen when users are in exploratory analysis mode but want to ask some directed questions of the data as well. In a progressive visualization system, the online aggregation algorithm determines the database sampling rate and resulting convergence rate, not the user. In this paper, we extend a recent method in online aggregation, called Wander Join, that is optimized for queries that join tables, one of the most computationally expensive operations. This extension leverages importance sampling to enable user-driven sampling when data joins are in the query. We applied user interaction techniques that allow the user to view and adjust the convergence rate, providing more transparency and control over the online aggregation process. By leveraging importance sampling, our extension of Wander Join also allows for stratified sampling of groups when there is data distribution skew. We also improve the convergence rate of filtering queries, but with additional overhead costs not needed in the original Wander Join algorithm.

Authors and Affiliations

Marianne Procopio, Carlos Scheidegger, Eugene Wu and Remco Chang

Keywords

Related Articles

Scalable Interactive Visualization for Connectomics

Connectomics has recently begun to image brain tissue at nanometer resolution, which produces petabytes of data. This data must be aligned, labeled, proofread, and formed into graphs, and each step of this process requir...

The Effect of Evidence Transfer on Latent Feature Relevance for Clustering

Evidence transfer for clustering is a deep learning method that manipulates the latent representations of an autoencoder according to external categorical evidence with the effect of improving a clustering outcome. Evi...

Quality Management in Big Data

Due to the importance of quality issues in Big Data, Big Data quality management has attracted significant research attention on how to measure, improve and manage the quality for Big Data. This special issue in the Jo...

Fitness Activity Recognition on Smartphones Using Doppler Measurements

Quantified Self has seen an increased interest in recent years, with devices including smartwatches, smartphones, or other wearables that allow you to monitor your fitness level. This is often combined with mobile apps...

Selective Wander Join: Fast Progressive Visualizations for Data Joins

Progressive visualization offers a great deal of promise for big data visualization; however, current progressive visualization systems do not allow for continuous interaction. What if users want to see more confident...

Download PDF file
  • EP ID EP44162
  • DOI https://doi.org/10.3390/informatics6010014
  • Views 246
  • Downloads 0

How To Cite

Marianne Procopio, Carlos Scheidegger, Eugene Wu and Remco Chang (2019). Selective Wander Join: Fast Progressive Visualizations for Data Joins. Informatics, 6(1), -. https://europub.co.uk/articles/-A-44162