[Openmp-commits] [PATCH] D154172: [OpenMP] Added memory scope to atomic::inc API and used the device scope in reduction.
Dhruva Chakrabarti via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Fri Jun 30 00:17:57 PDT 2023
dhruvachak added inline comments.
================
Comment at: openmp/libomptarget/DeviceRTL/src/Reduction.cpp:226
+#ifndef __AMDGCN__
fence::system(atomic::seq_cst);
----------------
jdoerfert wrote:
> dhruvachak wrote:
> > arsenm wrote:
> > > I doubt this is target specific
> > Well, I don't know whether it is required for NVPTX for example, so I kept it around.
> >
> > As an aside, I think the system fence is too conservative, but again I kept it unchanged for archs other than amdgpu.
> > As an aside, I think the system fence is too conservative, but again I kept it unchanged for archs other than amdgpu.
>
> 1) This is unrelated.
> 2) Could you elaborate on why you think the change is sound (on AMDGPUs)?
1. Agreed this is unrelated. I can separate this out if you like.
2. With reference to https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-memory-model-gfx90a (gfx90a as an example), the guide points out the required instructions. And that's what we get with this patch for the atomic inc:
s_waitcnt vmcnt(0) lgkmcnt(0)
global_atomic_inc v0, v1, v0, s[2:3] glc
s_waitcnt vmcnt(0)
buffer_wbinvl1_vol
If we bring back the system fence, here's the resulting sequence (fence followed by the atomic inc):
buffer_wbl2
s_waitcnt vmcnt(0) lgkmcnt(0)
buffer_invl2
buffer_wbinvl1_vol
s_waitcnt vmcnt(0) lgkmcnt(0)
global_atomic_inc v0, v1, v0, s[2:3] glc
s_waitcnt vmcnt(0)
buffer_wbinvl1_vol
According to the guide, the L2 instructions are required for coherence between different agents. For the reduction use case, we need coherence within a GPU which is provided by buffer_wbinvl1_vol following the atomic inc. So we can drop the L2 instructions here. The duplicate waitcnt and the buffer_wbinvl1_vol can be dropped too.
In other words, as long as the ordering and scope are correct on the atomic operation, we should not need an additional fence. Perhaps it was required before D137524.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D154172/new/
https://reviews.llvm.org/D154172
More information about the Openmp-commits
mailing list