Utilizing Provenance in Reusable Research Objects
Journal Title: Informatics - Year 2018, Vol 5, Issue 1
Abstract
Science is conducted collaboratively, often requiring the sharing of knowledge about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. Computational provenance is often the key to enable such reuse. In this paper, we show how reusable research objects can utilize provenance to correctly repeat a previous reference execution, to construct a subset of a research object for partial reuse, and to reuse existing contents of a research object for modified reuse. We describe two methods to summarize provenance that aid in understanding the contents and past executions of a research object. The first method obtains a process-view by collapsing low-level system information, and the second method obtains a summary graph by grouping related nodes and edges with the goal to obtain a graph view similar to application workflow. Through detailed experiments, we show the efficacy and efficiency of our algorithms.
Authors and Affiliations
Zhihao Yuan, Dai Hai Ton That, Siddhant Kothari, Gabriel Fils and Tanu Malik
Motivation and User Engagement in Fitness Tracking: Heuristics for Mobile Healthcare Wearables
Wearable fitness trackers have gained a new level of popularity due to their ambient data gathering and analysis. This has signalled a trend toward self-efficacy and increased motivation among users of these devices. F...
Creating a Multimodal Translation Tool and Testing Machine Translation Integration Using Touch and Voice
Commercial software tools for translation have, until now, been based on the traditional input modes of keyboard and mouse, latterly with a small amount of speech recognition input becoming popular. In order to test wh...
Data Provenance for Agent-Based Models in a Distributed Memory
Agent-Based Models (ABMs) assist with studying emergent collective behavior of individual entities in social, biological, economic, network, and physical systems. Data provenance can support ABM by explaining individual...
Acknowledgement to Reviewers of Informatics in 2017
Peer review is an essential part in the publication process, ensuring that Informatics maintains high quality standards for its published papers. In 2017, a total of 44 papers were published in the journal. Thanks to t...
Constructing Interactive Visual Classification, Clustering and Dimension Reduction Models for n-D Data
The exploration of multidimensional datasets of all possible sizes and dimensions is a long-standing challenge in knowledge discovery, machine learning, and visualization. While multiple efficient visualization methods...