ETL Best Practices for Data Quality Checks in RIS Databases

Journal Title: Informatics - Year 2019, Vol 6, Issue 1

Abstract

The topic of data integration from external data sources or independent IT-systems has received increasing attention recently in IT departments as well as at management level, in particular concerning data integration in federated database systems. An example of the latter are commercial research information systems (RIS), which regularly import, cleanse, transform and prepare the analysis research information of the institutions of a variety of databases. In addition, all these so-called steps must be provided in a secured quality. As several internal and external data sources are loaded for integration into the RIS, ensuring information quality is becoming increasingly challenging for the research institutions. Before the research information is transferred to a RIS, it must be checked and cleaned up. An important factor for successful or competent data integration is therefore always the data quality. The removal of data errors (such as duplicates and harmonization of the data structure, inconsistent data and outdated data, etc.) are essential tasks of data integration using extract, transform, and load (ETL) processes. Data is extracted from the source systems, transformed and loaded into the RIS. At this point conflicts between different data sources are controlled and solved, as well as data quality issues during data integration are eliminated. Against this background, our paper presents the process of data transformation in the context of RIS which gains an overview of the quality of research information in an institution’s internal and external data sources during its integration into RIS. In addition, the question of how to control and improve the quality issues during the integration process in RIS will be addressed.

Authors and Affiliations

Otmane Azeroual, Gunter Saake and Mohammad Abuosba

Keywords

Related Articles

Analyzing Spatiotemporal Anomalies through Interactive Visualization

As we move into the big data era, data grows not just in size, but also in complexity, containing a rich set of attributes, including location and time information, such as data from mobile devices (e.g., smart phones),...

Bus Operations Scheduling Subject to Resource Constraints Using Evolutionary Optimization

In public transport operations, vehicles tend to bunch together due to the instability of passenger demand and traffic conditions. Fluctuation of the expected waiting times of passengers at bus stops due to bus bunchin...

Data Provenance for Agent-Based Models in a Distributed Memory

Agent-Based Models (ABMs) assist with studying emergent collective behavior of individual entities in social, biological, economic, network, and physical systems. Data provenance can support ABM by explaining individual...

Visual Analysis of Stochastic Trajectory Ensembles in Organic Solar Cell Design

We present a visualization system for analyzing stochastic particle trajectory ensembles, resulting from Kinetic Monte-Carlo simulations on charge transport in organic solar cells. The system supports the analysis of s...

Modifying Dialogical Strategy in Asynchronous Critical Discussions for Cross-Strait Chinese Learners

In this global era, critical thinking has become crucial for educators and learners. The purpose of this research was to explore how modifying a dialogical strategy in asynchronous online discussion forums impacted Chi...

Download PDF file
  • EP ID EP44166
  • DOI https://doi.org/10.3390/informatics6010010
  • Views 260
  • Downloads 0

How To Cite

Otmane Azeroual, Gunter Saake and Mohammad Abuosba (2019). ETL Best Practices for Data Quality Checks in RIS Databases. Informatics, 6(1), -. https://europub.co.uk/articles/-A-44166