[cfe-dev] openmp 4.5 and cuda streams

Finkel, Hal J. via cfe-dev cfe-dev at lists.llvm.org
Thu Oct 31 13:36:49 PDT 2019


On 10/31/19 3:06 PM, Alexey Bataev wrote:

Hope to send this message from the main dev e-mail this time :)


Well, about the memory. It depends on the number of kernels you have. All the memory in the kernels that must be globalized is squashed into a union. With streams we need to use the separate structure for each particular kernel. Plus, we cannot use shared memory for this buffer anymore again because of possible conflict.


We can add a new compiler option to compile only some files with streams support and use unique memory buffer for the globalized variables. Plus, some work in the libomptarget is required, of course.


Do we also need some kind of libomptarget API change in order to communicate the fact that it's allowed to run multiple target regions concurrently?


Thanks again,

Hal



-------------
Best regards,
Alexey Bataev

31.10.2019 3:58 PM, Finkel, Hal J. пишет:


On 10/31/19 10:54 AM, Luo, Ye wrote:
Hi Hal,
My experience of llvm/clang so far shows:
1. all the target offload is blocking synchronous using the default stream. nowait is not supported.
2. all the memory transfer calls invoke cudaMemcpy. There are no async calls.
3. I had an experiment in the past turning on CUDA_API_PER_THREAD_DEFAULT_STREAM in libomptarget.
Then I use multiple host threads to do individual blocking synchronous offload. I got it sort of running and saw multple streams but the code crashes due to memory corruption probably due to some data race in libomptarget.


Thanks, Ye. That's consistent with Alexey's comments.


Is there already a bug open on this? If not, we should open one.


Alexey, the buffer-reuse optimizations in Clang that you mentioned, how much memory/overhead do they save? Is it worth keeping them in some mode?


 -Hal


Best,
Ye

________________________________
From: Finkel, Hal J. <hfinkel at anl.gov><mailto:hfinkel at anl.gov>
Sent: Wednesday, October 30, 2019 1:40 PM
To: Alessandro Gabbana <gbblsn at unife.it><mailto:gbblsn at unife.it>; cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org> <cfe-dev at lists.llvm.org><mailto:cfe-dev at lists.llvm.org>; Luo, Ye <yeluo at anl.gov><mailto:yeluo at anl.gov>; Doerfert, Johannes <jdoerfert at anl.gov><mailto:jdoerfert at anl.gov>
Subject: Re: [cfe-dev] openmp 4.5 and cuda streams

[+Ye, Johannes]

I recall that we've also observed this behavior. Ye, Johannes, we had a
work-around and a patch, correct?

  -Hal

On 10/30/19 12:28 PM, Alessandro Gabbana via cfe-dev wrote:
> Dear All,
>
> I'm using clang 9.0.0 to compile a code which offloads sections of a
> code on a GPU using the openmp target construct.
> I also use the nowait clause to overlap the execution of certain
> kernels and/or host<->device memory transfers.
> However, using the nvidia profiler I've noticed that when I compile
> the code with clang only one cuda stream is active,
> and therefore the execution gets serialized. On the other hand, when
> compiling with XLC I see that kernels are executed
> on different streams. I could not understand if this is the expected
> behavior (e.g. the nowait clause is currently not supported),
> or if I'm missing something. I'm using a NVIDIA Tesla P100 GPU and
> compiling with the following options:
>
> -target x86_64-pc-linux-gnu -fopenmp
> -fopenmp-targets=nvptx64-nvidia-cuda
> -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_60
>
> best wishes
>
> Alessandro
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20191031/5ba35b4d/attachment.html>


More information about the cfe-dev mailing list