[PATCH] D144759: [AMDGPU] Implement idempotent atomic lowering

Fri Mar 3 04:43:48 PST 2023

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:13371-13377
+  // We do not need to insert a fence here, memory legalizer will do.
+  LoadInst *LI = Builder.CreateAlignedLoad(
+      AI->getType(), AI->getPointerOperand(), AI->getAlign());
+  LI->setAtomic(Order, SSID);
+  LI->copyMetadata(*AI);
+  LI->takeName(AI);
+  AI->replaceAllUsesWith(LI);
----------------
rampitec wrote:
> rampitec wrote:
> > arsenm wrote:
> > > I don't understand why this is a target hook. Why can't this unconditionally happen in the generic code?
> > Probably not. The only target implementing it is x86 and it issues target specific intrinsics. It also skips 'or' with zero as it claims to have a better lowering.
> Also note that I am skipping the fence on the grounds that memory legalizer will fence it. Otherwise with our address spaces and scopes this would be quite non-trivial and target specific code.
I also don't understand that a note. The original atomicrmw wouldn't have implied a fence to begin with?

================
Comment at: llvm/test/CodeGen/AMDGPU/idemponent-atomics.ll:30
+entry:
+  store i32 1, ptr addrspace(1) undef
+  %val = atomicrmw or ptr addrspace(1) %in, i32 0 syncscope("agent-one-as") acquire, align 4
----------------
avoid store to undef in new tests

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144759/new/

https://reviews.llvm.org/D144759