[Openmp-dev] OpenMP offload implicitly using streams

Wed Mar 20 10:50:50 PDT 2019

On 3/20/19 12:25 PM, Alexey Bataev wrote:

Hi Hal, it is hard to tell. I can try to add the option that will lead to definition of this macro to clang, if you want to try it.

That would be great, thanks!

 -Hal

-------------
Best regards,
Alexey Bataev

20.03.2019 13:12, Finkel, Hal J. пишет:

Thanks, Ye. I suppose that I thought it always worked that way :-)

Alexey, Doru, do you know if there's any semantic problem or other concerns with enabling this option and/or making it the default?

 -Hal

On 3/20/19 11:32 AM, Ye Luo via Openmp-dev wrote:
Hi all,
After going through the source, I didn't find CUDA stream support.
Luckily, I only need to add
#define CUDA_API_PER_THREAD_DEFAULT_STREAM
before
#include <cuda.h>
in libomptarget/plugins/cuda/src/rtl.cpp
Then the multiple target goes to different streams and may execute concurrently.
#pragma omp parallel
{
  #pragma omp target
  {
    //offload computation
  }
}
This is exactly I want.

I know the XL compiler uses streams in a different way but achieves similar effects.
Is there anyone working on using streams with openmp target in llvm?
Will clang-ykt get something similar to XL and upstream to the mainline?

If we just add #define CUDA_API_PER_THREAD_DEFAULT_STREAM in the cuda rtl, will it be a trouble?
As a compiler user, I'd like to have a better solution rather than having a patch just for myself.

Best,
Ye
===================
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory

Ye Luo <xw111luoye at gmail.com<mailto:xw111luoye at gmail.com>> 于2019年3月17日周日 下午2:26写道：
Hi,
How to turn on streams when using OpenMP offload?
When different host threads individually start target regions (even not using nowait). The offloaded computation goes to different CUDA streams and may execute concurrently. This is currently available in XL.
With Clang, nvprof shows only the run only uses the default stream.
Is there a way to do that with Clang?
On the other hand,
nvcc has option --default-stream per-thread
I'm not familar with clang CUDA, is there a similar option?
Best,
Ye
===================
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
Openmp-dev mailing list
Openmp-dev at lists.llvm.org<mailto:Openmp-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20190320/91193b5c/attachment-0001.html>