A Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis
Journal Title: Informatics - Year 2017, Vol 4, Issue 4
Abstract
To ensure seamless, programmatic access to data for High Performance Computing (HPC) and analysis across multiple research domains, it is vital to have a methodology for standardization of both data and services. At the Australian National Computational Infrastructure (NCI) we have developed a Data Quality Strategy (DQS) that currently provides processes for: (1) Consistency of data structures needed for a High Performance Data (HPD) platform; (2) Quality Control (QC) through compliance with recognized community standards; (3) Benchmarking cases of operational performance tests; and (4) Quality Assurance (QA) of data through demonstrated functionality and performance across common platforms, tools and services. By implementing the NCI DQS, we have seen progressive improvement in the quality and usefulness of the datasets across the different subject domains, and demonstrated the ease by which modern programmatic methods can be used to access the data, either in situ or via web services, and for uses ranging from traditional analysis methods through to emerging machine learning techniques. To help increase data re-usability by broader communities, particularly in high performance environments, the DQS is also used to identify the need for any extensions to the relevant international standards for interoperability and/or programmatic access.
Authors and Affiliations
Ben Evans, Kelsey Druken, Jingbo Wang, Rui Yang, Clare Richards and Lesley Wyborn
Large Scale Advanced Data Analytics on Skin Conditions from Genotype to Phenotype
A crucial factor in Big Data is to take advantage of available data and use that for new discovery or hypothesis generation. In this study, we analyzed Large-scale data from the literature to OMICS, such as the genome,...
Interactive Graph Layout of a Million Nodes
Sensemaking of large graphs, specifically those with millions of nodes, is a crucial task in many fields. Automatic graph layout algorithms, augmented with real-time human-in-the-loop interaction, can potentially suppo...
A Hybrid Approach to Recognising Activities of Daily Living from Object Use in the Home Environment
Accurate recognition of Activities of Daily Living (ADL) plays an important role in providing assistance and support to the elderly and cognitively impaired. Current knowledge-driven and ontology-based techniques model...
Artery Segmentation in Ultrasound Images Based on an Evolutionary Scheme
Segmentation in ultrasound (US) images is a challenge in computer vision, due to the high signal noise, artifacts that produce discontinuities in the boundaries and shadows that hide part of the received signal. In thi...
Building Realistic Mobility Models for Mobile Ad Hoc Networks
A mobile ad hoc network (MANET) is a self-configuring wireless network in which each node could act as a router, as well as a data source or sink. Its application areas include battlefields and vehicular and disaster a...