Using Introspection to Collect Provenance in R

Journal Title: Informatics - Year 2018, Vol 5, Issue 1

Abstract

Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R’s powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility.

Authors and Affiliations

Barbara Lerner, Emery Boose and Luis Perez

Keywords

Related Articles

Domain-Specific Aspect-Sentiment Pair Extraction Using Rules and Compound Noun Lexicon for Customer Reviews

Online reviews are an important source of opinion to measure products’ quality. Hence, automated opinion mining is used to extract important features (aspect) and related comments (sentiment). Extraction of correct asp...

Preparation for Working in a Knowledge-Based Society: New Zealand Student Nurses’ Use of Social Media

The increasing use of social media is revolutionizing the way students learn, communicate and collaborate. Many of the skills used with social media are similar to those needed to work in a knowledge-based society. To...

Creating a Multimodal Translation Tool and Testing Machine Translation Integration Using Touch and Voice

Commercial software tools for translation have, until now, been based on the traditional input modes of keyboard and mouse, latterly with a small amount of speech recognition input becoming popular. In order to test wh...

A Novel Three-Stage Filter-Wrapper Framework for miRNA Subset Selection in Cancer Classification

Micro-Ribonucleic Acids (miRNAs) are small non-coding Ribonucleic Acid (RNA) molecules that play an important role in the cancer growth. There are a lot of miRNAs in the human body and not all of them are responsible f...

Disabling and Enabling Technologies for Learning in Higher Education for All: Issues and Challenges for Whom?

Integration, inclusion, and equity constitute fundamental dimensions of democracy in post-World War II societies and their institutions. The study presented here reports upon the ways in which individuals and instituti...

Download PDF file
  • EP ID EP44119
  • DOI https://doi.org/10.3390/informatics5010012
  • Views 272
  • Downloads 0

How To Cite

Barbara Lerner, Emery Boose and Luis Perez (2018). Using Introspection to Collect Provenance in R. Informatics, 5(1), -. https://europub.co.uk/articles/-A-44119