[PATCH] D144759: [AMDGPU] Implement idempotent atomic lowering

Fri Mar 3 13:51:58 PST 2023

rampitec added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:13371-13377
+  // We do not need to insert a fence here, memory legalizer will do.
+  LoadInst *LI = Builder.CreateAlignedLoad(
+      AI->getType(), AI->getPointerOperand(), AI->getAlign());
+  LI->setAtomic(Order, SSID);
+  LI->copyMetadata(*AI);
+  LI->takeName(AI);
+  AI->replaceAllUsesWith(LI);
----------------
arsenm wrote:
> rampitec wrote:
> > rampitec wrote:
> > > arsenm wrote:
> > > > I don't understand why this is a target hook. Why can't this unconditionally happen in the generic code?
> > > Probably not. The only target implementing it is x86 and it issues target specific intrinsics. It also skips 'or' with zero as it claims to have a better lowering.
> > Also note that I am skipping the fence on the grounds that memory legalizer will fence it. Otherwise with our address spaces and scopes this would be quite non-trivial and target specific code.
> I also don't understand that a note. The original atomicrmw wouldn't have implied a fence to begin with?
The name of the function implies a fenced load. The actual explanation of why a fence is needed when there were no fence on the atomicrmw is in the x86 code (although even x86 skips it in some situations). My understanding is that we are removing a store by this optimization, so if we had a store before the load it needs to be fenced. But actually only if order is stronger than release. This is why I have added potentially aliasing stores before the atomicrmw in the test, to verify that memory legalizer will properly fence it.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144759/new/

https://reviews.llvm.org/D144759