[Openmp-dev] OpenMP offload implicitly using streams

Wed Mar 20 09:32:19 PDT 2019

Hi all,
After going through the source, I didn't find CUDA stream support.
Luckily, I only need to add
#define CUDA_API_PER_THREAD_DEFAULT_STREAM
before
#include <cuda.h>
in libomptarget/plugins/cuda/src/rtl.cpp
Then the multiple target goes to different streams and may execute
concurrently.
#pragma omp parallel
{
  #pragma omp target
  {
    //offload computation
  }
}
This is exactly I want.

I know the XL compiler uses streams in a different way but achieves similar
effects.
Is there anyone working on using streams with openmp target in llvm?
Will clang-ykt get something similar to XL and upstream to the mainline?

If we just add #define CUDA_API_PER_THREAD_DEFAULT_STREAM in the cuda rtl,
will it be a trouble?
As a compiler user, I'd like to have a better solution rather than having a
patch just for myself.

Best,
Ye
===================
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory

Ye Luo <xw111luoye at gmail.com> 于2019年3月17日周日 下午2:26写道：

> Hi,
> How to turn on streams when using OpenMP offload?
> When different host threads individually start target regions (even not
> using nowait). The offloaded computation goes to different CUDA streams and
> may execute concurrently. This is currently available in XL.
> With Clang, nvprof shows only the run only uses the default stream.
> Is there a way to do that with Clang?
> On the other hand,
> nvcc has option --*default*-*stream per*-*thread*
> I'm not familar with clang CUDA, is there a similar option?
> Best,
> Ye
> ===================
> Ye Luo, Ph.D.
> Computational Science Division & Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20190320/5d84f5ba/attachment.html>