Benchmarking overlapping communication and computations with multiple streams for modern GPUs
Journal Title: Annals of Computer Science and Information Systems - Year 2018, Vol 17, Issue
Abstract
The paper presents benchmarking a multi-stream application processing a set of input data arrays. Tests have been performed and execution times measured for various numbers of streams and various compute intensities measured as the ratio of kernel compute time and data transfer time. As such, the application and benchmarking is representative of frequently used operations such as vector weighted sum, matrix multiplication etc. The paper shows benefits of using multiple data streams for various compute intensities compared to one stream, benchmarked for 4 GPUs: professional NVIDIA Tesla V100, Tesla K20m, desktop GTX 1060 and mobile GeForce 940MX. Additionally, relative performances are shown for various numbers of kernel computations for these GPUs.
Authors and Affiliations
Paweł Czarnul
Robotic Process Automation of Unstructured Data with Machine Learning
In this paper we present our work in progress on building an artificial intelligence system dedicated to tasks regarding the processing of formal documents used in various kinds of business procedures. The main challenge...
Financial Inclusion in India and PMJDY: A Critical Review
The recent developments in banking and insurance have transformed the financial system, however, it is restricted only to certain segments of the society, excluding others. i.e. ``financial exclusion''. People with low i...
Detection of Arrhythmia using Neural Network
There is an increase in cardio logical patients all over the world due to change in modern life style. It forces the medical researchers to search for smart devices that can diagnosis and predict the onset of cardiac pro...
Usability Of An E-Commerce Website Using Information Mining and Artificial Intelligence
Everyday a number of people are launching new websites of which many are e-Commerce websites. E-Commerce website means business and to have business they have to be useful to the customer. So, it is very important for th...
Application of ASIP in Embedded Design with Optimized Clock Management
As the demand for high performance computing increases, new approaches have to be found to automate the design of embedded processors. Simultaneously, new tools have to be developed to short the execution time consumptio...