[cfe-dev] openmp 4.5 and cuda streams

Alexey Bataev via cfe-dev cfe-dev at lists.llvm.org
Thu Oct 31 13:06:37 PDT 2019


Hope to send this message from the main dev e-mail this time :)


Well, about the memory. It depends on the number of kernels you have.
All the memory in the kernels that must be globalized is squashed into a
union. With streams we need to use the separate structure for each
particular kernel. Plus, we cannot use shared memory for this buffer
anymore again because of possible conflict.


We can add a new compiler option to compile only some files with streams
support and use unique memory buffer for the globalized variables. Plus,
some work in the libomptarget is required, of course.


-------------
Best regards,
Alexey Bataev

31.10.2019 3:58 PM, Finkel, Hal J. пишет:
>
>
> On 10/31/19 10:54 AM, Luo, Ye wrote:
>> Hi Hal,
>> My experience of llvm/clang so far shows:
>> 1. all the target offload is blocking synchronous using the default
>> stream. nowait is not supported.
>> 2. all the memory transfer calls invoke cudaMemcpy. There are no
>> async calls.
>> 3. I had an experiment in the past turning on
>> CUDA_API_PER_THREAD_DEFAULT_STREAM in libomptarget.
>> Then I use multiple host threads to do individual blocking
>> synchronous offload. I got it sort of running and saw multple streams
>> but the code crashes due to memory corruption probably due to some
>> data race in libomptarget.
>
>
> Thanks, Ye. That's consistent with Alexey's comments.
>
>
> Is there already a bug open on this? If not, we should open one.
>
>
> Alexey, the buffer-reuse optimizations in Clang that you mentioned,
> how much memory/overhead do they save? Is it worth keeping them in
> some mode?
>
>
>  -Hal
>
>
>> Best,
>> Ye
>>
>> ------------------------------------------------------------------------
>> *From:* Finkel, Hal J. <hfinkel at anl.gov>
>> *Sent:* Wednesday, October 30, 2019 1:40 PM
>> *To:* Alessandro Gabbana <gbblsn at unife.it>; cfe-dev at lists.llvm.org
>> <cfe-dev at lists.llvm.org>; Luo, Ye <yeluo at anl.gov>; Doerfert, Johannes
>> <jdoerfert at anl.gov>
>> *Subject:* Re: [cfe-dev] openmp 4.5 and cuda streams
>>  
>> [+Ye, Johannes]
>>
>> I recall that we've also observed this behavior. Ye, Johannes, we had a
>> work-around and a patch, correct?
>>
>>   -Hal
>>
>> On 10/30/19 12:28 PM, Alessandro Gabbana via cfe-dev wrote:
>> > Dear All,
>> >
>> > I'm using clang 9.0.0 to compile a code which offloads sections of a
>> > code on a GPU using the openmp target construct.
>> > I also use the nowait clause to overlap the execution of certain
>> > kernels and/or host<->device memory transfers.
>> > However, using the nvidia profiler I've noticed that when I compile
>> > the code with clang only one cuda stream is active,
>> > and therefore the execution gets serialized. On the other hand, when
>> > compiling with XLC I see that kernels are executed
>> > on different streams. I could not understand if this is the expected
>> > behavior (e.g. the nowait clause is currently not supported),
>> > or if I'm missing something. I'm using a NVIDIA Tesla P100 GPU and
>> > compiling with the following options:
>> >
>> > -target x86_64-pc-linux-gnu -fopenmp
>> > -fopenmp-targets=nvptx64-nvidia-cuda
>> > -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_60
>> >
>> > best wishes
>> >
>> > Alessandro
>> >
>> > _______________________________________________
>> > cfe-dev mailing list
>> > cfe-dev at lists.llvm.org
>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>> -- 
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20191031/11bd1e99/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20191031/11bd1e99/attachment.sig>


More information about the cfe-dev mailing list