A Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis

Journal Title: Informatics - Year 2017, Vol 4, Issue 4

Abstract

To ensure seamless, programmatic access to data for High Performance Computing (HPC) and analysis across multiple research domains, it is vital to have a methodology for standardization of both data and services. At the Australian National Computational Infrastructure (NCI) we have developed a Data Quality Strategy (DQS) that currently provides processes for: (1) Consistency of data structures needed for a High Performance Data (HPD) platform; (2) Quality Control (QC) through compliance with recognized community standards; (3) Benchmarking cases of operational performance tests; and (4) Quality Assurance (QA) of data through demonstrated functionality and performance across common platforms, tools and services. By implementing the NCI DQS, we have seen progressive improvement in the quality and usefulness of the datasets across the different subject domains, and demonstrated the ease by which modern programmatic methods can be used to access the data, either in situ or via web services, and for uses ranging from traditional analysis methods through to emerging machine learning techniques. To help increase data re-usability by broader communities, particularly in high performance environments, the DQS is also used to identify the need for any extensions to the relevant international standards for interoperability and/or programmatic access.

Authors and Affiliations

Ben Evans, Kelsey Druken, Jingbo Wang, Rui Yang, Clare Richards and Lesley Wyborn

Keywords

Related Articles

Unstructured Text in EMR Improves Prediction of Death after Surgery in Children

Text fields in electronic medical records (EMR) contain information on important factors that influence health outcomes, however, they are underutilized in clinical decision making due to their unstructured nature. We...

Preparation for Working in a Knowledge-Based Society: New Zealand Student Nurses’ Use of Social Media

The increasing use of social media is revolutionizing the way students learn, communicate and collaborate. Many of the skills used with social media are similar to those needed to work in a knowledge-based society. To...

Evaluation of the Omaha System Prototype Icons for Global Health Literacy

Omaha System problem concepts describe a comprehensive, holistic view of health in simple terms that have been represented in a set of prototype icons intended for universal use by consumers and clinicians. The purpose...

Advancing Social Media and Mobile Technologies in Healthcare Education

Social media and mobile technologies are important new tools in healthcare education, both to assist healthcare professionals learn and maintain their craft, and for the education of patients and families. Social media...

LabelFlow Framework for Annotating Workflow Provenance

Scientists routinely analyse and share data for others to use. Successful data (re)use relies on having metadata describing the context of analysis of data. In many disciplines the creation of contextual metadata is re...

Download PDF file
  • EP ID EP44103
  • DOI https://doi.org/10.3390/informatics4040045
  • Views 248
  • Downloads 0

How To Cite

Ben Evans, Kelsey Druken, Jingbo Wang, Rui Yang, Clare Richards and Lesley Wyborn (2017). A Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis. Informatics, 4(4), -. https://europub.co.uk/articles/-A-44103