[Openmp-commits] [PATCH] D74145: [OpenMP][Offloading] Added support for multiple streams so that multiple kernels can be executed concurrently
Shilei Tian via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Sat Feb 8 13:45:14 PST 2020
tianshilei1992 added a comment.
I did a little experiment to show the performance improvement. Here is the micro benchmark:
#include <math.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
void kernel() {
const int num_threads = 64;
#pragma omp parallel for
for (int i = 0; i < num_threads; ++i) {
const size_t N = 1UL << 10;
#pragma omp target teams distribute parallel for
for (size_t i = 0; i < N; ++i) {
for (size_t j = 0; j < N / 2; ++j) {
float x = sqrt(pow(3.14159, j));
}
}
}
}
int main(int argc, char *argv[]) {
const int N = 1000;
const clock_t start = clock();
for (int i = 0; i < N; ++i) {
kernel();
}
const clock_t duration = (clock() - start) * 1000 / CLOCKS_PER_SEC / N;
printf("Avg time: %ld ms\n", duration);
return 0;
}
The execution result with multiple stream is:
$ /usr/local/cuda/bin/nvprof --output-profile parallel_offloading_ms.prof -f ./parallel_offloading
==32397== NVPROF is profiling process 32397, command: ./parallel_offloading
Avg time: 1081 ms
==32397== Generated result file: /home/shiltian/Documents/project/multiple_streams/tests/multistreams/parallel_offloading_ms.prof
And the result w/o multiple stream is:
$ /usr/local/cuda/bin/nvprof --output-profile parallel_offloading.prof -f ./parallel_offloading
==35547== NVPROF is profiling process 35547, command: ./parallel_offloading
Avg time: 5825 ms
==35547== Generated result file: /home/shiltian/Documents/project/multiple_streams/tests/multistreams/parallel_offloading.prof
We can see that 1081 vs 5825 ms, approximately 5.4x speedup.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D74145/new/
https://reviews.llvm.org/D74145
More information about the Openmp-commits
mailing list