[Openmp-commits] [openmp] r350333 - [OPENMP][NVPTX]Fix incompatibility of __syncthreads with LLVM, NFC.
Finkel, Hal J. via Openmp-commits
openmp-commits at lists.llvm.org
Thu Jan 3 10:46:52 PST 2019
On 1/3/19 12:40 PM, Alexey Bataev wrote:
>
> It is a bug in the LLVM optimizer, somewhere in the split critical
> edges pass or something like this, I don't remember exactly.
>
Okay. We should definitely have a tracking bug for it. Doru, can you
please take care of that? Is NVIDIA working on an upstream fix?
Thanks again,
Hal
> -------------
> Best regards,
> Alexey Bataev
> 03.01.2019 13:38, Finkel, Hal J. пишет:
>> On 1/3/19 12:32 PM, Alexey Bataev wrote:
>>> Hi Hal, this problem was reported to NVidia by Doru Bercea and was
>>> confirmed, as far as I know. I don't know if he reported it in the
>>> bugzilla. But the problem exists.
>> So this is a bug in Clang's CUDA, in nvcc, in ptxas?
>>
>> Thanks again,
>>
>> Hal
>>
>>
>>> -------------
>>> Best regards,
>>> Alexey Bataev
>>>
>>> 03.01.2019 13:30, Finkel, Hal J. пишет:
>>>> Hi, Alexey,
>>>>
>>>> We shouldn't have a semantic problem with __syncthreads because of how
>>>> inference works on the convergent attribute. Do we have an open bug
>>>> describing the problem, and if not, can you please file one? It sounds
>>>> like this tail cloning in critical-edge splitting needs to be updated to
>>>> respect the convergent attribute?
>>>>
>>>> Thanks again,
>>>>
>>>> Hal
>>>>
>>>> On 1/3/19 11:43 AM, Alexey Bataev via Openmp-commits wrote:
>>>>> Author: abataev
>>>>> Date: Thu Jan 3 09:43:46 2019
>>>>> New Revision: 350333
>>>>>
>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=350333&view=rev
>>>>> Log:
>>>>> [OPENMP][NVPTX]Fix incompatibility of __syncthreads with LLVM, NFC.
>>>>>
>>>>> Summary:
>>>>> One of the LLVM optimizations, split critical edges, also clones tail
>>>>> instructions. This is a dangerous operation for __syncthreads()
>>>>> functions and this transformation leads to undefined behavior or
>>>>> incorrect results. Patch fixes this problem by replacing __syncthreads()
>>>>> function with the assembler instruction, which cost is too high and
>>>>> wich cannot be copied.
>>>>>
>>>>> Reviewers: grokos, gtbercea, kkwli0
>>>>>
>>>>> Subscribers: guansong, openmp-commits, caomhin
>>>>>
>>>>> Differential Revision: https://reviews.llvm.org/D56274
>>>>>
>>>>> Modified:
>>>>> openmp/trunk/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
>>>>> openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu
>>>>> openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h
>>>>> openmp/trunk/libomptarget/deviceRTLs/nvptx/src/sync.cu
>>>>>
>>>>> Modified: openmp/trunk/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
>>>>> URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu?rev=350333&r1=350332&r2=350333&view=diff
>>>>> ==============================================================================
>>>>> --- openmp/trunk/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu (original)
>>>>> +++ openmp/trunk/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu Thu Jan 3 09:43:46 2019
>>>>> @@ -564,7 +564,8 @@ EXTERN void __kmpc_get_team_static_memor
>>>>> if (GetThreadIdInBlock() == 0) {
>>>>> *frame = omptarget_nvptx_simpleMemoryManager.Acquire(buf, size);
>>>>> }
>>>>> - __syncthreads();
>>>>> + // FIXME: use __syncthreads instead when the function copy is fixed in LLVM.
>>>>> + __SYNCTHREADS();
>>>>> return;
>>>>> }
>>>>> ASSERT0(LT_FUSSY, GetThreadIdInBlock() == getMasterThreadId(),
>>>>> @@ -577,7 +578,8 @@ EXTERN void __kmpc_restore_team_static_m
>>>>> if (is_shared)
>>>>> return;
>>>>> if (isSPMDMode()) {
>>>>> - __syncthreads();
>>>>> + // FIXME: use __syncthreads instead when the function copy is fixed in LLVM.
>>>>> + __SYNCTHREADS();
>>>>> if (GetThreadIdInBlock() == 0) {
>>>>> omptarget_nvptx_simpleMemoryManager.Release();
>>>>> }
>>>>>
>>>>> Modified: openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu
>>>>> URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu?rev=350333&r1=350332&r2=350333&view=diff
>>>>> ==============================================================================
>>>>> --- openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu (original)
>>>>> +++ openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu Thu Jan 3 09:43:46 2019
>>>>> @@ -105,7 +105,8 @@ EXTERN void __kmpc_spmd_kernel_init(int
>>>>> omptarget_nvptx_simpleThreadPrivateContext =
>>>>> omptarget_nvptx_device_simpleState[slot].Dequeue();
>>>>> }
>>>>> - __syncthreads();
>>>>> + // FIXME: use __syncthreads instead when the function copy is fixed in LLVM.
>>>>> + __SYNCTHREADS();
>>>>> omptarget_nvptx_simpleThreadPrivateContext->Init();
>>>>> return;
>>>>> }
>>>>> @@ -129,7 +130,8 @@ EXTERN void __kmpc_spmd_kernel_init(int
>>>>> // init team context
>>>>> currTeamDescr.InitTeamDescr();
>>>>> }
>>>>> - __syncthreads();
>>>>> + // FIXME: use __syncthreads instead when the function copy is fixed in LLVM.
>>>>> + __SYNCTHREADS();
>>>>>
>>>>> omptarget_nvptx_TeamDescr &currTeamDescr = getMyTeamDescriptor();
>>>>> omptarget_nvptx_WorkDescr &workDescr = getMyWorkDescriptor();
>>>>> @@ -170,7 +172,8 @@ EXTERN __attribute__((deprecated)) void
>>>>> EXTERN void __kmpc_spmd_kernel_deinit_v2(int16_t RequiresOMPRuntime) {
>>>>> // We're not going to pop the task descr stack of each thread since
>>>>> // there are no more parallel regions in SPMD mode.
>>>>> - __syncthreads();
>>>>> + // FIXME: use __syncthreads instead when the function copy is fixed in LLVM.
>>>>> + __SYNCTHREADS();
>>>>> int threadId = GetThreadIdInBlock();
>>>>> if (!RequiresOMPRuntime) {
>>>>> if (threadId == 0) {
>>>>>
>>>>> Modified: openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h
>>>>> URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h?rev=350333&r1=350332&r2=350333&view=diff
>>>>> ==============================================================================
>>>>> --- openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h (original)
>>>>> +++ openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h Thu Jan 3 09:43:46 2019
>>>>> @@ -63,6 +63,9 @@
>>>>> #define __ACTIVEMASK() __ballot(1)
>>>>> #endif
>>>>>
>>>>> +#define __SYNCTHREADS_N(n) asm volatile("bar.sync %0;" : : "r"(n) : "memory");
>>>>> +#define __SYNCTHREADS() __SYNCTHREADS_N(0)
>>>>> +
>>>>> // arguments needed for L0 parallelism only.
>>>>> class omptarget_nvptx_SharedArgs {
>>>>> public:
>>>>>
>>>>> Modified: openmp/trunk/libomptarget/deviceRTLs/nvptx/src/sync.cu
>>>>> URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/libomptarget/deviceRTLs/nvptx/src/sync.cu?rev=350333&r1=350332&r2=350333&view=diff
>>>>> ==============================================================================
>>>>> --- openmp/trunk/libomptarget/deviceRTLs/nvptx/src/sync.cu (original)
>>>>> +++ openmp/trunk/libomptarget/deviceRTLs/nvptx/src/sync.cu Thu Jan 3 09:43:46 2019
>>>>> @@ -74,7 +74,8 @@ EXTERN void __kmpc_barrier(kmp_Ident *lo
>>>>> // parallel region and that all worker threads participate.
>>>>> EXTERN void __kmpc_barrier_simple_spmd(kmp_Ident *loc_ref, int32_t tid) {
>>>>> PRINT0(LD_SYNC, "call kmpc_barrier_simple_spmd\n");
>>>>> - __syncthreads();
>>>>> + // FIXME: use __syncthreads instead when the function copy is fixed in LLVM.
>>>>> + __SYNCTHREADS();
>>>>> PRINT0(LD_SYNC, "completed kmpc_barrier_simple_spmd\n");
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Openmp-commits mailing list
>>>>> Openmp-commits at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-commits
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
More information about the Openmp-commits
mailing list