ETL Best Practices for Data Quality Checks in RIS Databases

Journal Title: Informatics - Year 2019, Vol 6, Issue 1

Abstract

The topic of data integration from external data sources or independent IT-systems has received increasing attention recently in IT departments as well as at management level, in particular concerning data integration in federated database systems. An example of the latter are commercial research information systems (RIS), which regularly import, cleanse, transform and prepare the analysis research information of the institutions of a variety of databases. In addition, all these so-called steps must be provided in a secured quality. As several internal and external data sources are loaded for integration into the RIS, ensuring information quality is becoming increasingly challenging for the research institutions. Before the research information is transferred to a RIS, it must be checked and cleaned up. An important factor for successful or competent data integration is therefore always the data quality. The removal of data errors (such as duplicates and harmonization of the data structure, inconsistent data and outdated data, etc.) are essential tasks of data integration using extract, transform, and load (ETL) processes. Data is extracted from the source systems, transformed and loaded into the RIS. At this point conflicts between different data sources are controlled and solved, as well as data quality issues during data integration are eliminated. Against this background, our paper presents the process of data transformation in the context of RIS which gains an overview of the quality of research information in an institution’s internal and external data sources during its integration into RIS. In addition, the question of how to control and improve the quality issues during the integration process in RIS will be addressed.

Authors and Affiliations

Otmane Azeroual, Gunter Saake and Mohammad Abuosba

Keywords

Related Articles

Multiple-Criteria Decision Support for a Sustainable Supply Chain: Applications to the Fashion Industry

With increasing globalization and international cooperation, the importance of sustainability management across supply chains has received much attention by companies across various industries. Companies therefore stri...

Towards Clustering of Mobile and Smartwatch Accelerometer Data for Physical Activity Recognition

Mobile and wearable devices now have a greater capability of sensing human activity ubiquitously and unobtrusively through advancements in miniaturization and sensing abilities. However, outstanding issues remain around...

Theory and Practice in Digital Behaviour Change: A Matrix Framework for the Co-Production of Digital Services That Engage, Empower and Emancipate Marginalised People Living with Complex and Chronic Conditions

Background: The WHO framework on integrated people-centred health services promotes a focus on the needs of people and their communities to empower them to have a more active role in their own health. It has advocated...

Exploiting Past Users’ Interests and Predictions in an Active Learning Method for Dealing with Cold Start in Recommender Systems

This paper focuses on the new users cold-start issue in the context of recommender systems. New users who do not receive pertinent recommendations may abandon the system. In order to cope with this issue, we use active...

Statistical Deadband: A Novel Approach for Event-Based Data Reporting

Deadband algorithms are implemented inside industrial gateways to reduce the volume of data sent across different networks. By tuning the deadband sampling resolution by a preset interval D, it is possible to estimate...

Download PDF file
  • EP ID EP44166
  • DOI https://doi.org/10.3390/informatics6010010
  • Views 274
  • Downloads 0

How To Cite

Otmane Azeroual, Gunter Saake and Mohammad Abuosba (2019). ETL Best Practices for Data Quality Checks in RIS Databases. Informatics, 6(1), -. https://europub.co.uk/articles/-A-44166