LabelFlow Framework for Annotating Workflow Provenance
Journal Title: Informatics - Year 2018, Vol 5, Issue 1
Abstract
Scientists routinely analyse and share data for others to use. Successful data (re)use relies on having metadata describing the context of analysis of data. In many disciplines the creation of contextual metadata is referred to as reporting. One method of implementing analyses is with workflows. A stand-out feature of workflows is their ability to record provenance from executions. Provenance is useful when analyses are executed with changing parameters (changing contexts) and results need to be traced to respective parameters. In this paper we investigate whether provenance can be exploited to support reporting. Specifically; we outline a case-study based on a real-world workflow and set of reporting queries. We observe that provenance, as collected from workflow executions, is of limited use for reporting, as it supports queries partially. We identify that this is due to the generic nature of provenance, its lack of domain-specific contextual metadata. We observe that the required information is available in implicit form, embedded in data. We describe LabelFlow, a framework comprised of four Labelling Operators for decorating provenance with domain-specific Labels. LabelFlow can be instantiated for a domain by plugging it with domain-specific metadata extractors. We provide a tool that takes as input a workflow, and produces as output a Labelling Pipeline for that workflow, comprised of Labelling Operators. We revisit the case-study and show how Labels provide a more complete implementation of reporting queries.
Authors and Affiliations
Pinar Alper, Khalid Belhajjame, Vasa Curcin and Carole A. Goble
Thinking Informatically
On being promoted to a personal chair in 1993 I chose the title of Professor of Informatics, specifically acknowledging Donna Haraway’s definition of the term as the “technologies of information [and communication] as...
An Adaptable System to Support Provenance Management for the Public Policy-Making Process in Smart Cities
Government policies aim to address public issues and problems and therefore play a pivotal role in people’s lives. The creation of public policies, however, is complex given the perspective of large and diverse stakeho...
A New Co-Evolution Binary Particle Swarm Optimization with Multiple Inertia Weight Strategy for Feature Selection
Feature selection is a task of choosing the best combination of potential features that best describes the target concept during a classification process. However, selecting such relevant features becomes a difficult m...
Fitness Activity Recognition on Smartphones Using Doppler Measurements
Quantified Self has seen an increased interest in recent years, with devices including smartwatches, smartphones, or other wearables that allow you to monitor your fitness level. This is often combined with mobile apps...
Interactive Spatiotemporal Analysis of Oil Spills Using Comap in North Dakota
The aim of the study is to analyze the oil spill pattern from various types of incidents and contaminants to determine the extent that incident data can be used as a baseline to prevent hazardous material releases and...