[Openmp-commits] [PATCH] D154172: [OpenMP] Added memory scope to atomic::inc API and used the device scope in reduction.

Dhruva Chakrabarti via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Fri Jun 30 00:17:57 PDT 2023


dhruvachak added inline comments.


================
Comment at: openmp/libomptarget/DeviceRTL/src/Reduction.cpp:226
 
+#ifndef __AMDGCN__
     fence::system(atomic::seq_cst);
----------------
jdoerfert wrote:
> dhruvachak wrote:
> > arsenm wrote:
> > > I doubt this is target specific 
> > Well, I don't know whether it is required for NVPTX for example, so I kept it around.
> > 
> > As an aside, I think the system fence is too conservative, but again I kept it unchanged for archs other than amdgpu.
> > As an aside, I think the system fence is too conservative, but again I kept it unchanged for archs other than amdgpu.
> 
> 1) This is unrelated.
> 2) Could you elaborate on why you think the change is sound (on AMDGPUs)?
1. Agreed this is unrelated. I can separate this out if you like.
2. With reference to https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-memory-model-gfx90a (gfx90a as an example), the guide points out the required instructions. And that's what we get with this patch for the atomic inc:

	s_waitcnt vmcnt(0) lgkmcnt(0)
	global_atomic_inc v0, v1, v0, s[2:3] glc
	s_waitcnt vmcnt(0)
	buffer_wbinvl1_vol

If we bring back the system fence, here's the resulting sequence (fence followed by the atomic inc):

	buffer_wbl2
	s_waitcnt vmcnt(0) lgkmcnt(0)
	buffer_invl2
	buffer_wbinvl1_vol
	s_waitcnt vmcnt(0) lgkmcnt(0)
	global_atomic_inc v0, v1, v0, s[2:3] glc
	s_waitcnt vmcnt(0)
	buffer_wbinvl1_vol

According to the guide, the L2 instructions are required for coherence between different agents. For the reduction use case, we need coherence within a GPU which is provided by buffer_wbinvl1_vol following the atomic inc. So we can drop the L2 instructions here. The duplicate waitcnt and the buffer_wbinvl1_vol can be dropped too.

In other words, as long as the ordering and scope are correct on the atomic operation, we should not need an additional fence. Perhaps it was required before D137524.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154172/new/

https://reviews.llvm.org/D154172



More information about the Openmp-commits mailing list