[cfe-dev] openmp 4.5 and cuda streams
Alessandro Gabbana via cfe-dev
cfe-dev at lists.llvm.org
Wed Oct 30 10:28:31 PDT 2019
Dear All,
I'm using clang 9.0.0 to compile a code which offloads sections of a
code on a GPU using the openmp target construct.
I also use the nowait clause to overlap the execution of certain kernels
and/or host<->device memory transfers.
However, using the nvidia profiler I've noticed that when I compile the
code with clang only one cuda stream is active,
and therefore the execution gets serialized. On the other hand, when
compiling with XLC I see that kernels are executed
on different streams. I could not understand if this is the expected
behavior (e.g. the nowait clause is currently not supported),
or if I'm missing something. I'm using a NVIDIA Tesla P100 GPU and
compiling with the following options:
-target x86_64-pc-linux-gnu -fopenmp
-fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda
-march=sm_60
best wishes
Alessandro
More information about the cfe-dev
mailing list