LabelFlow Framework for Annotating Workflow Provenance

Journal Title: Informatics - Year 2018, Vol 5, Issue 1

Abstract

Scientists routinely analyse and share data for others to use. Successful data (re)use relies on having metadata describing the context of analysis of data. In many disciplines the creation of contextual metadata is referred to as reporting. One method of implementing analyses is with workflows. A stand-out feature of workflows is their ability to record provenance from executions. Provenance is useful when analyses are executed with changing parameters (changing contexts) and results need to be traced to respective parameters. In this paper we investigate whether provenance can be exploited to support reporting. Specifically; we outline a case-study based on a real-world workflow and set of reporting queries. We observe that provenance, as collected from workflow executions, is of limited use for reporting, as it supports queries partially. We identify that this is due to the generic nature of provenance, its lack of domain-specific contextual metadata. We observe that the required information is available in implicit form, embedded in data. We describe LabelFlow, a framework comprised of four Labelling Operators for decorating provenance with domain-specific Labels. LabelFlow can be instantiated for a domain by plugging it with domain-specific metadata extractors. We provide a tool that takes as input a workflow, and produces as output a Labelling Pipeline for that workflow, comprised of Labelling Operators. We revisit the case-study and show how Labels provide a more complete implementation of reporting queries.

Authors and Affiliations

Pinar Alper, Khalid Belhajjame, Vasa Curcin and Carole A. Goble

Keywords

Related Articles

Interactive Graph Layout of a Million Nodes

Sensemaking of large graphs, specifically those with millions of nodes, is a crucial task in many fields. Automatic graph layout algorithms, augmented with real-time human-in-the-loop interaction, can potentially suppo...

Reinforcement Learning for Predictive Analytics in Smart Cities

The digitization of our lives cause a shift in the data production as well as in the required data management. Numerous nodes are capable of producing huge volumes of data in our everyday activities. Sensors, personal...

Back-Off Time Calculation Algorithms in WSN

In a Mobile Wireless Sensor Mesh Network (MWSMN), based on the IEEE 802.15.4 standard, low power consumption is vitally important since the network devices are mostly battery driven. This is especially true for devices...

Ambient Assisted Living for Improvement of Health and Quality of Life—A Special Issue of the Journal of Informatics

The demographic change with respect to the ageing of the population has been a worldwide trend. As a direct result, it has been recognised as causing substantial social transformation in the 21st century [1]. By 2050,...

In Search of Smartness: The EU e-Justice Challenge

At the EU level, an increasing number of resources are being invested in an attempt to provide better public services through the use of Information and Communication Technology (ICT). While new tools are being designe...

Download PDF file
  • EP ID EP44120
  • DOI https://doi.org/10.3390/informatics5010011
  • Views 261
  • Downloads 0

How To Cite

Pinar Alper, Khalid Belhajjame, Vasa Curcin and Carole A. Goble (2018). LabelFlow Framework for Annotating Workflow Provenance. Informatics, 5(1), -. https://europub.co.uk/articles/-A-44120