Efficient Distributed SPARQL Queries on Apache Spark

Abstract

RDF is a widely-accepted framework for describing metadata in the web due to its simplicity and universal graph-like data model. Owing to the abundance of RDF data, existing query techniques are rendered unsuitable. To this direction, we adopt the processing power of Apache Spark to load and query a large dataset much more quickly than classical approaches. In this paper, we have designed experiments to evaluate the performance of several queries ranging from single attribute selection to selection, filtering and sorting multiple attributes in the dataset. We further experimented with the performance of queries using distributed SPARQL query on Apache Spark GraphX and studied different stages involved in this pipeline. The execution of distributed SPARQL query on Apache Spark GraphX helped us study its performance and gave insights into which stages of the pipeline can be improved. The query pipeline comprised of Graph loading, Basic Graph Pattern and Result calculating. Our goal is to minimize the time during graph loading stage in order to improve overall performance and cut the costs of data loading.

Authors and Affiliations

Saleh Albahli

Keywords

Related Articles

Efficient Eye Blink Detection Method for disabled-helping domain

In this paper, we present a real time method based on some video and image processing algorithms for eye blink detection. The motivation of this research is the need of disabling who cannot control the calls with human m...

Aquabot: A Diagnostic Chatbot for Achluophobia and Autism

Chatbots or chatter bots have been a good way to entertain one. This paper emphasizes on the use of a chatbot in the diagnosis of Achluophobia – the fear of darkness and autism disorder. Autism and Achluophobia (fear of...

A Study of Influential Factors in the Adoption and Diffusion of B2C E-Commerce

This paper looks at the present standing of e-commerce in Saudi Arabia as well as the challenges and strengths of Business to Customers (B2C) electronic commerce. Many studies have been conducted around the world in orde...

Design of a Microstrip Patch Antenna with High Bandwidth and High Gain for UWB and Different Wireless Applications

We propose square shape patch antenna in this research work. Focus of the work is to obtain large bandwidth with compact ground plane for wireless applications. The proposed antenna is designed using dielectric material...

WHITE - DONKEY: Unmanned Aerial Vehicle for searching missing people

Searching for a missing person is not an easy task to accomplish,so over the years search methods have been developed, the problem is that the methods currently available have certain limitations and these limitations ar...

Download PDF file
  • EP ID EP626847
  • DOI 10.14569/IJACSA.2019.0100874
  • Views 87
  • Downloads 0

How To Cite

Saleh Albahli (2019). Efficient Distributed SPARQL Queries on Apache Spark. International Journal of Advanced Computer Science & Applications, 10(8), 564-568. https://europub.co.uk/articles/-A-626847