Using Introspection to Collect Provenance in R

Journal Title: Informatics - Year 2018, Vol 5, Issue 1

Abstract

Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R’s powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility.

Authors and Affiliations

Barbara Lerner, Emery Boose and Luis Perez

Keywords

Related Articles

An Internet of Things Based Multi-Level Privacy-Preserving Access Control for Smart Living

The presence of the Internet of Things (IoT) in healthcare through the use of mobile medical applications and wearable devices allows patients to capture their healthcare data and enables healthcare professionals to be...

Big Data in the Era of Health Information Exchanges: Challenges and Opportunities for Public Health

Public health surveillance of communicable diseases depends on timely, complete, accurate, and useful data that are collected across a number of healthcare and public health systems. Health Information Exchanges (HIEs)...

In Search of Smartness: The EU e-Justice Challenge

At the EU level, an increasing number of resources are being invested in an attempt to provide better public services through the use of Information and Communication Technology (ICT). While new tools are being designe...

Storing the Wisdom: Chemical Concepts and Chemoinformatics

The purpose of the paper is to examine the nature of chemical concepts, and the ways in which they are applied in chemoinformatics systems. An account of concepts in philosophy and in the information sciences leads to...

Towards Clustering of Mobile and Smartwatch Accelerometer Data for Physical Activity Recognition

Mobile and wearable devices now have a greater capability of sensing human activity ubiquitously and unobtrusively through advancements in miniaturization and sensing abilities. However, outstanding issues remain around...

Download PDF file
  • EP ID EP44119
  • DOI https://doi.org/10.3390/informatics5010012
  • Views 259
  • Downloads 0

How To Cite

Barbara Lerner, Emery Boose and Luis Perez (2018). Using Introspection to Collect Provenance in R. Informatics, 5(1), -. https://europub.co.uk/articles/-A-44119