[Openmp-commits] [openmp] r350333 - [OPENMP][NVPTX]Fix incompatibility of __syncthreads with LLVM, NFC.

Finkel, Hal J. via Openmp-commits openmp-commits at lists.llvm.org
Thu Jan 3 10:46:52 PST 2019


On 1/3/19 12:40 PM, Alexey Bataev wrote:
>
> It is a bug in the LLVM optimizer, somewhere in the split critical
> edges pass or something like this, I don't remember exactly.
>

Okay. We should definitely have a tracking bug for it. Doru, can you
please take care of that? Is NVIDIA working on an upstream fix?

Thanks again,

Hal

> -------------
> Best regards,
> Alexey Bataev
> 03.01.2019 13:38, Finkel, Hal J. пишет:
>> On 1/3/19 12:32 PM, Alexey Bataev wrote:
>>> Hi Hal, this problem was reported to NVidia by Doru Bercea and was
>>> confirmed, as far as I know. I don't know if he reported it in the
>>> bugzilla. But the problem exists.
>> So this is a bug in Clang's CUDA, in nvcc, in ptxas?
>>
>> Thanks again,
>>
>> Hal
>>
>>
>>> -------------
>>> Best regards,
>>> Alexey Bataev
>>>
>>> 03.01.2019 13:30, Finkel, Hal J. пишет:
>>>> Hi, Alexey,
>>>>
>>>> We shouldn't have a semantic problem with __syncthreads because of how
>>>> inference works on the convergent attribute. Do we have an open bug
>>>> describing the problem, and if not, can you please file one? It sounds
>>>> like this tail cloning in critical-edge splitting needs to be updated to
>>>> respect the convergent attribute?
>>>>
>>>> Thanks again,
>>>>
>>>> Hal
>>>>
>>>> On 1/3/19 11:43 AM, Alexey Bataev via Openmp-commits wrote:
>>>>> Author: abataev
>>>>> Date: Thu Jan  3 09:43:46 2019
>>>>> New Revision: 350333
>>>>>
>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=350333&view=rev
>>>>> Log:
>>>>> [OPENMP][NVPTX]Fix incompatibility of __syncthreads with LLVM, NFC.
>>>>>
>>>>> Summary:
>>>>> One of the LLVM optimizations, split critical edges, also clones tail
>>>>> instructions. This is a dangerous operation for __syncthreads()
>>>>> functions and this transformation leads to undefined behavior or
>>>>> incorrect results. Patch fixes this problem by replacing __syncthreads()
>>>>> function with the assembler instruction, which cost is too high and
>>>>> wich cannot be copied.
>>>>>
>>>>> Reviewers: grokos, gtbercea, kkwli0
>>>>>
>>>>> Subscribers: guansong, openmp-commits, caomhin
>>>>>
>>>>> Differential Revision: https://reviews.llvm.org/D56274
>>>>>
>>>>> Modified:
>>>>>     openmp/trunk/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
>>>>>     openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu
>>>>>     openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h
>>>>>     openmp/trunk/libomptarget/deviceRTLs/nvptx/src/sync.cu
>>>>>
>>>>> Modified: openmp/trunk/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
>>>>> URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu?rev=350333&r1=350332&r2=350333&view=diff
>>>>> ==============================================================================
>>>>> --- openmp/trunk/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu (original)
>>>>> +++ openmp/trunk/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu Thu Jan  3 09:43:46 2019
>>>>> @@ -564,7 +564,8 @@ EXTERN void __kmpc_get_team_static_memor
>>>>>      if (GetThreadIdInBlock() == 0) {
>>>>>        *frame = omptarget_nvptx_simpleMemoryManager.Acquire(buf, size);
>>>>>      }
>>>>> -    __syncthreads();
>>>>> +    // FIXME: use __syncthreads instead when the function copy is fixed in LLVM.
>>>>> +    __SYNCTHREADS();
>>>>>      return;
>>>>>    }
>>>>>    ASSERT0(LT_FUSSY, GetThreadIdInBlock() == getMasterThreadId(),
>>>>> @@ -577,7 +578,8 @@ EXTERN void __kmpc_restore_team_static_m
>>>>>    if (is_shared)
>>>>>      return;
>>>>>    if (isSPMDMode()) {
>>>>> -    __syncthreads();
>>>>> +    // FIXME: use __syncthreads instead when the function copy is fixed in LLVM.
>>>>> +    __SYNCTHREADS();
>>>>>      if (GetThreadIdInBlock() == 0) {
>>>>>        omptarget_nvptx_simpleMemoryManager.Release();
>>>>>      }
>>>>>
>>>>> Modified: openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu
>>>>> URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu?rev=350333&r1=350332&r2=350333&view=diff
>>>>> ==============================================================================
>>>>> --- openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu (original)
>>>>> +++ openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu Thu Jan  3 09:43:46 2019
>>>>> @@ -105,7 +105,8 @@ EXTERN void __kmpc_spmd_kernel_init(int
>>>>>        omptarget_nvptx_simpleThreadPrivateContext =
>>>>>            omptarget_nvptx_device_simpleState[slot].Dequeue();
>>>>>      }
>>>>> -    __syncthreads();
>>>>> +    // FIXME: use __syncthreads instead when the function copy is fixed in LLVM.
>>>>> +    __SYNCTHREADS();
>>>>>      omptarget_nvptx_simpleThreadPrivateContext->Init();
>>>>>      return;
>>>>>    }
>>>>> @@ -129,7 +130,8 @@ EXTERN void __kmpc_spmd_kernel_init(int
>>>>>      // init team context
>>>>>      currTeamDescr.InitTeamDescr();
>>>>>    }
>>>>> -  __syncthreads();
>>>>> +  // FIXME: use __syncthreads instead when the function copy is fixed in LLVM.
>>>>> +  __SYNCTHREADS();
>>>>>  
>>>>>    omptarget_nvptx_TeamDescr &currTeamDescr = getMyTeamDescriptor();
>>>>>    omptarget_nvptx_WorkDescr &workDescr = getMyWorkDescriptor();
>>>>> @@ -170,7 +172,8 @@ EXTERN __attribute__((deprecated)) void
>>>>>  EXTERN void __kmpc_spmd_kernel_deinit_v2(int16_t RequiresOMPRuntime) {
>>>>>    // We're not going to pop the task descr stack of each thread since
>>>>>    // there are no more parallel regions in SPMD mode.
>>>>> -  __syncthreads();
>>>>> +  // FIXME: use __syncthreads instead when the function copy is fixed in LLVM.
>>>>> +  __SYNCTHREADS();
>>>>>    int threadId = GetThreadIdInBlock();
>>>>>    if (!RequiresOMPRuntime) {
>>>>>      if (threadId == 0) {
>>>>>
>>>>> Modified: openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h
>>>>> URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h?rev=350333&r1=350332&r2=350333&view=diff
>>>>> ==============================================================================
>>>>> --- openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h (original)
>>>>> +++ openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h Thu Jan  3 09:43:46 2019
>>>>> @@ -63,6 +63,9 @@
>>>>>  #define __ACTIVEMASK() __ballot(1)
>>>>>  #endif
>>>>>  
>>>>> +#define __SYNCTHREADS_N(n) asm volatile("bar.sync %0;" : : "r"(n) : "memory");
>>>>> +#define __SYNCTHREADS() __SYNCTHREADS_N(0)
>>>>> +
>>>>>  // arguments needed for L0 parallelism only.
>>>>>  class omptarget_nvptx_SharedArgs {
>>>>>  public:
>>>>>
>>>>> Modified: openmp/trunk/libomptarget/deviceRTLs/nvptx/src/sync.cu
>>>>> URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/libomptarget/deviceRTLs/nvptx/src/sync.cu?rev=350333&r1=350332&r2=350333&view=diff
>>>>> ==============================================================================
>>>>> --- openmp/trunk/libomptarget/deviceRTLs/nvptx/src/sync.cu (original)
>>>>> +++ openmp/trunk/libomptarget/deviceRTLs/nvptx/src/sync.cu Thu Jan  3 09:43:46 2019
>>>>> @@ -74,7 +74,8 @@ EXTERN void __kmpc_barrier(kmp_Ident *lo
>>>>>  // parallel region and that all worker threads participate.
>>>>>  EXTERN void __kmpc_barrier_simple_spmd(kmp_Ident *loc_ref, int32_t tid) {
>>>>>    PRINT0(LD_SYNC, "call kmpc_barrier_simple_spmd\n");
>>>>> -  __syncthreads();
>>>>> +  // FIXME: use __syncthreads instead when the function copy is fixed in LLVM.
>>>>> +  __SYNCTHREADS();
>>>>>    PRINT0(LD_SYNC, "completed kmpc_barrier_simple_spmd\n");
>>>>>  }
>>>>>  
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Openmp-commits mailing list
>>>>> Openmp-commits at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-commits

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory



More information about the Openmp-commits mailing list