Using Introspection to Collect Provenance in R
Journal Title: Informatics - Year 2018, Vol 5, Issue 1
Abstract
Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R’s powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility.
Authors and Affiliations
Barbara Lerner, Emery Boose and Luis Perez
Evaluating Awareness and Perception of Botnet Activity within Consumer Internet-of-Things (IoT) Networks
The growth of the Internet of Things (IoT), and demand for low-cost, easy-to-deploy devices, has led to the production of swathes of insecure Internet-connected devices. Many can be exploited and leveraged to perform l...
In Search of Smartness: The EU e-Justice Challenge
At the EU level, an increasing number of resources are being invested in an attempt to provide better public services through the use of Information and Communication Technology (ICT). While new tools are being designe...
Supporting Sensemaking of Complex Objects with Visualizations: Visibility and Complementarity of Interactions
Making sense of complex objects is difficult, and typically requires the use of external representations to support cognitive demands while reasoning about the objects. Visualizations are one type of external representat...
Creating a Multimodal Translation Tool and Testing Machine Translation Integration Using Touch and Voice
Commercial software tools for translation have, until now, been based on the traditional input modes of keyboard and mouse, latterly with a small amount of speech recognition input becoming popular. In order to test wh...
Interactive Spatiotemporal Analysis of Oil Spills Using Comap in North Dakota
The aim of the study is to analyze the oil spill pattern from various types of incidents and contaminants to determine the extent that incident data can be used as a baseline to prevent hazardous material releases and...