[cfe-dev] openmp 4.5 and cuda streams
Alexey Bataev via cfe-dev
cfe-dev at lists.llvm.org
Wed Oct 30 12:28:22 PDT 2019
Hal, seems to me, not everything is protected. Some buffers are reused
for different kernels, I assume. Better to ask Alex Eichenberger, he
knows more about it, I did not not investigate this problem.
As to clang, we try to reduce the size of the buffers in the global
memory for the reduction/lastprivate/etc. vars, which may escape their
declaration context. These buffers cannot be combined in streams mode,
need to allocate unique buffer for each particular kernel. It is not
very hard to do, it is just not implemented yet.
-------------
Best regards,
Alexey Bataev
30.10.2019 3:22 PM, Finkel, Hal J. пишет:
> On 10/30/19 1:48 PM, GMail wrote:
>> I don't think it will be very easy. It requires some additional work
>> in libomptarget + some fixes in the clang itself. Otherwise there
>> might be some race conditions.
>>
> Can you be more specific? I thought that the mapping table, etc. were
> already appropriately protected.
>
> As a general thought, we should probably have a mode in which the
> runtime is compiled with ThreadSanitizer to check for these kinds of things.
>
> Thanks again,
>
> Hal
>
>
>> -------------
>> Best regards,
>> Alexey Bataev
>> 30.10.2019 2:40 PM, Finkel, Hal J. via cfe-dev пишет:
>>> [+Ye, Johannes]
>>>
>>> I recall that we've also observed this behavior. Ye, Johannes, we had a
>>> work-around and a patch, correct?
>>>
>>> -Hal
>>>
>>> On 10/30/19 12:28 PM, Alessandro Gabbana via cfe-dev wrote:
>>>> Dear All,
>>>>
>>>> I'm using clang 9.0.0 to compile a code which offloads sections of a
>>>> code on a GPU using the openmp target construct.
>>>> I also use the nowait clause to overlap the execution of certain
>>>> kernels and/or host<->device memory transfers.
>>>> However, using the nvidia profiler I've noticed that when I compile
>>>> the code with clang only one cuda stream is active,
>>>> and therefore the execution gets serialized. On the other hand, when
>>>> compiling with XLC I see that kernels are executed
>>>> on different streams. I could not understand if this is the expected
>>>> behavior (e.g. the nowait clause is currently not supported),
>>>> or if I'm missing something. I'm using a NVIDIA Tesla P100 GPU and
>>>> compiling with the following options:
>>>>
>>>> -target x86_64-pc-linux-gnu -fopenmp
>>>> -fopenmp-targets=nvptx64-nvidia-cuda
>>>> -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_60
>>>>
>>>> best wishes
>>>>
>>>> Alessandro
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20191030/8f1417a0/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20191030/8f1417a0/attachment.sig>
More information about the cfe-dev
mailing list