[all-commits] [llvm/llvm-project] 36d621: [OpenMP] Ensure memory fences are created with bar...
Ye Luo via All-commits
all-commits at lists.llvm.org
Fri Mar 24 18:40:05 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 36d6217c4eb02c15168bf74c9f7ef44ea4fb7e41
https://github.com/llvm/llvm-project/commit/36d6217c4eb02c15168bf74c9f7ef44ea4fb7e41
Author: Ye Luo <yeluo at anl.gov>
Date: 2023-03-24 (Fri, 24 Mar 2023)
Changed paths:
M openmp/libomptarget/DeviceRTL/include/Synchronization.h
M openmp/libomptarget/DeviceRTL/src/Kernel.cpp
M openmp/libomptarget/DeviceRTL/src/Parallelism.cpp
M openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
A openmp/libomptarget/test/offloading/barrier_fence.c
Log Message:
-----------
[OpenMP] Ensure memory fences are created with barriers for AMDGPUs
It turns out that the `__builtin_amdgcn_s_barrier()` alone does not emit
a fence. We somehow got away with this and assumed it would work as it
(hopefully) is correct on the NVIDIA path where we just emit a
`__syncthreads`. After talking to @arsenm we now (mostly) align with the
OpenCL barrier implementation [1] and emit explicit fences for AMDGPUs.
It seems this was the underlying cause for #59759, but I am not 100%
certain. There is a chance this simply hides the problem.
Fixes: https://github.com/llvm/llvm-project/issues/59759
[1] https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/07b347366eb2c6ebc3414af323c623cbbbafc854/opencl/src/workgroup/wgbarrier.cl#L21
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D145290
More information about the All-commits
mailing list