Automatic RDF-ization of big data semi-structured datasets

Journal Title: MASKANA - Year 2016, Vol 7, Issue 3

Abstract

Linked data adoption continues to grow in many fields at a considerable pace. However, some of the most important datasets usually remain underexploited because of two main reasons: the huge volume of the datasets and the lack of methods for automatic conversion to RDF. This paper presents an automatic approach to tackle these problems by leveraging recent Big Data tools and a program for automatic conversion from a relational model to RDF. Overall, the process can be summarized in three steps: 1) bulk transfer of data from different sources to Hive/HDFS; 2) transformation of data on Hive to RDF using D2RQ; and 3) storing the resulting RDF in CumulusRDF. By using these Big Data tools, the platform will cope with the handling of big amounts of data available in different sources, which can include structured or semi-structured data. Moreover, since the RDF data are stored in CumulusRDF in the final step, users or applications can consume the resulting data by means of web services or SPARQL queries. Finally, an evaluation in the hydro-meteorological domain demonstrates the soundness of our approach.

Authors and Affiliations

Ronald Gualán, Renán Freire, Andrés Tello, Mauricio Espinoza, Víctor Saquicela

Keywords

Related Articles

Prevalencia de Cryptosporidium spp. y Giardia spp. en terneros, y su presencia en agua y en niños con problemas digestivos en el cantón San Fernando, Ecuador

Objetivo: Determinar la prevalencia de Cryptosporidium spp. y Giardia spp. en heces de terneros de 0- 4 meses de edad, como factor contaminante de los recursos hídricos de uso de la población humana de 2-6 años de edad...

Sistema supervisor inteligente para procesos de producción de etróleo

Maximizar la producción de pozos de crudo pesado y extra pesado es el principal beneficio que se desea obtener de los sistemas de control que están corrientemente operativos en empresas de petróleo. Dada la naturaleza...

Sobre la evaluación de predicciones de un modelo de recursos hídricos

The analysis of the most commonly used measures of hydrological/hydraulic model performance was herein carried out by means of their statistical examination and illustrative modelling applications. In doing so, the mod...

Sistema de registro de daños para determinar el estado constructivo en muros de adobe

Los muros de adobe son elementos comunes en la construcción de edificios patrimoniales en el Centro Histórico de Cuenca. La mayoría de estas edificaciones fueron construidas en la época republicana, especialmente duran...

An analysis of the relationship between higher education performance and socio-economic and technological indicators: The Latin American case study

This paper reports on a study that analyzed the research output of higher education systems in a select number of Latin American countries and its relationship to several socio-economic and technological success indica...

Download PDF file
  • EP ID EP42148
  • DOI -
  • Views 246
  • Downloads 0

How To Cite

Ronald Gualán, Renán Freire, Andrés Tello, Mauricio Espinoza, Víctor Saquicela (2016). Automatic RDF-ization of big data semi-structured datasets. MASKANA, 7(3), -. https://europub.co.uk/articles/-A-42148