Using Introspection to Collect Provenance in R
Journal Title: Informatics - Year 2018, Vol 5, Issue 1
Abstract
Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R’s powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility.
Authors and Affiliations
Barbara Lerner, Emery Boose and Luis Perez
An Internet of Things Based Multi-Level Privacy-Preserving Access Control for Smart Living
The presence of the Internet of Things (IoT) in healthcare through the use of mobile medical applications and wearable devices allows patients to capture their healthcare data and enables healthcare professionals to be...
Big Data in the Era of Health Information Exchanges: Challenges and Opportunities for Public Health
Public health surveillance of communicable diseases depends on timely, complete, accurate, and useful data that are collected across a number of healthcare and public health systems. Health Information Exchanges (HIEs)...
In Search of Smartness: The EU e-Justice Challenge
At the EU level, an increasing number of resources are being invested in an attempt to provide better public services through the use of Information and Communication Technology (ICT). While new tools are being designe...
Storing the Wisdom: Chemical Concepts and Chemoinformatics
The purpose of the paper is to examine the nature of chemical concepts, and the ways in which they are applied in chemoinformatics systems. An account of concepts in philosophy and in the information sciences leads to...
Towards Clustering of Mobile and Smartwatch Accelerometer Data for Physical Activity Recognition
Mobile and wearable devices now have a greater capability of sensing human activity ubiquitously and unobtrusively through advancements in miniaturization and sensing abilities. However, outstanding issues remain around...