Using Introspection to Collect Provenance in R

Journal Title: Informatics - Year 2018, Vol 5, Issue 1

Abstract

Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R’s powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility.

Authors and Affiliations

Barbara Lerner, Emery Boose and Luis Perez

Keywords

Related Articles

Reinforcement Learning for Predictive Analytics in Smart Cities

The digitization of our lives cause a shift in the data production as well as in the required data management. Numerous nodes are capable of producing huge volumes of data in our everyday activities. Sensors, personal...

Social Media Providing an International Virtual Elective Experience for Student Nurses

The advances in social media offer many opportunities for developing understanding of different countries and cultures without any implications of travel. Nursing has a global presence and yet it appears as though stud...

Theory and Practice in Digital Behaviour Change: A Matrix Framework for the Co-Production of Digital Services That Engage, Empower and Emancipate Marginalised People Living with Complex and Chronic Conditions

Background: The WHO framework on integrated people-centred health services promotes a focus on the needs of people and their communities to empower them to have a more active role in their own health. It has advocated...

Older People Using e-Health Services—Exploring Frequency of Use and Associations with Perceived Benefits for Spouse Caregivers

ICT, information- and communication technologies, and e-health services are essential for meeting future care demands. Greater knowledge regarding the implementation of e-health services in long-term care for older peo...

Creating a Multimodal Translation Tool and Testing Machine Translation Integration Using Touch and Voice

Commercial software tools for translation have, until now, been based on the traditional input modes of keyboard and mouse, latterly with a small amount of speech recognition input becoming popular. In order to test wh...

Download PDF file
  • EP ID EP44119
  • DOI https://doi.org/10.3390/informatics5010012
  • Views 251
  • Downloads 0

How To Cite

Barbara Lerner, Emery Boose and Luis Perez (2018). Using Introspection to Collect Provenance in R. Informatics, 5(1), -. https://europub.co.uk/articles/-A-44119