A Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis
Journal Title: Informatics - Year 2017, Vol 4, Issue 4
Abstract
To ensure seamless, programmatic access to data for High Performance Computing (HPC) and analysis across multiple research domains, it is vital to have a methodology for standardization of both data and services. At the Australian National Computational Infrastructure (NCI) we have developed a Data Quality Strategy (DQS) that currently provides processes for: (1) Consistency of data structures needed for a High Performance Data (HPD) platform; (2) Quality Control (QC) through compliance with recognized community standards; (3) Benchmarking cases of operational performance tests; and (4) Quality Assurance (QA) of data through demonstrated functionality and performance across common platforms, tools and services. By implementing the NCI DQS, we have seen progressive improvement in the quality and usefulness of the datasets across the different subject domains, and demonstrated the ease by which modern programmatic methods can be used to access the data, either in situ or via web services, and for uses ranging from traditional analysis methods through to emerging machine learning techniques. To help increase data re-usability by broader communities, particularly in high performance environments, the DQS is also used to identify the need for any extensions to the relevant international standards for interoperability and/or programmatic access.
Authors and Affiliations
Ben Evans, Kelsey Druken, Jingbo Wang, Rui Yang, Clare Richards and Lesley Wyborn
Applications of Blockchain Technology to Logistics Management in Integrated Casinos and Entertainment
The gaming industry has evolved into a multi-functional smart city that combines integrated casinos and entertainment (ICE). ICE logistics involve supply chains with various stages in geographically-distributed locatio...
Developing and Improving Student Non-Technical Skills in IT Education: A Literature Review and Model
The purpose of this paper is to identify portions of the literature in the areas of Information Technology (IT) management, skills development, and curriculum development that support the design of a holistic conceptua...
ICNP® R&D Centre Ireland: Defining Requirements for an Intersectoral Digital Landscape
The apparent speed and impact of creating a global digital landscape for health and social care tells us that the health workforce is playing catch-up with eHealth national programmes. Locating how and where the profes...
When Wiki Technology Meets Corporate Knowledge Management Routines: A Sociomateriality Perspective
There seems to be an inherent tension between wiki affordances—open boundaries, unconstrained editing, and transparency—and traditional knowledge management (KM) routines used in firms. The objective of this study is t...
Conversion of Legal Text to a Logical Rules Set from Medical Law Using the Medical Relational Model and the World Rule Model for a Medical Decision Support System
Automated formalization of legal text is a time- and effort-consuming task, but human-based validation consumes even more of both. The exchange of healthcare data in compliance with the medical privacy law requires exp...