ETL Best Practices for Data Quality Checks in RIS Databases

Journal Title: Informatics - Year 2019, Vol 6, Issue 1

Abstract

The topic of data integration from external data sources or independent IT-systems has received increasing attention recently in IT departments as well as at management level, in particular concerning data integration in federated database systems. An example of the latter are commercial research information systems (RIS), which regularly import, cleanse, transform and prepare the analysis research information of the institutions of a variety of databases. In addition, all these so-called steps must be provided in a secured quality. As several internal and external data sources are loaded for integration into the RIS, ensuring information quality is becoming increasingly challenging for the research institutions. Before the research information is transferred to a RIS, it must be checked and cleaned up. An important factor for successful or competent data integration is therefore always the data quality. The removal of data errors (such as duplicates and harmonization of the data structure, inconsistent data and outdated data, etc.) are essential tasks of data integration using extract, transform, and load (ETL) processes. Data is extracted from the source systems, transformed and loaded into the RIS. At this point conflicts between different data sources are controlled and solved, as well as data quality issues during data integration are eliminated. Against this background, our paper presents the process of data transformation in the context of RIS which gains an overview of the quality of research information in an institution’s internal and external data sources during its integration into RIS. In addition, the question of how to control and improve the quality issues during the integration process in RIS will be addressed.

Authors and Affiliations

Otmane Azeroual, Gunter Saake and Mohammad Abuosba

Keywords

Related Articles

Health Literacy for the General Public: Making a Case for Non-Trivial Visualizations

Health literacy is concerned with the degree to which individuals can access and understand information to make health decisions. The multifaceted nature of health data presents challenges for individuals seeking to im...

Web-Scale Multidimensional Visualization of Big Spatial Data to Support Earth Sciences—A Case Study with Visualizing Climate Simulation Data

The world is undergoing rapid changes in its climate, environment, and ecosystems due to increasing population growth, urbanization, and industrialization. Numerical simulation is becoming an important vehicle to enhan...

Using Collaborative Tagging for Text Classification: From Text Classification to Opinion Mining

Numerous initiatives have allowed users to share knowledge or opinions using collaborative platforms. In most cases, the users provide a textual description of their knowledge, following very limited or no constraints....

Quality Management in Big Data

Due to the importance of quality issues in Big Data, Big Data quality management has attracted significant research attention on how to measure, improve and manage the quality for Big Data. This special issue in the Jo...

Exploiting Past Users’ Interests and Predictions in an Active Learning Method for Dealing with Cold Start in Recommender Systems

This paper focuses on the new users cold-start issue in the context of recommender systems. New users who do not receive pertinent recommendations may abandon the system. In order to cope with this issue, we use active...

Download PDF file
  • EP ID EP44166
  • DOI https://doi.org/10.3390/informatics6010010
  • Views 272
  • Downloads 0

How To Cite

Otmane Azeroual, Gunter Saake and Mohammad Abuosba (2019). ETL Best Practices for Data Quality Checks in RIS Databases. Informatics, 6(1), -. https://europub.co.uk/articles/-A-44166