[PATCH] D156301: [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.

Fri Aug 11 02:26:49 PDT 2023

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:78
   Value *buildReduction(IRBuilder<> &B, AtomicRMWInst::BinOp Op, Value *V,
-                        Value *const Identity) const;
+                        Value *const Identity,
+                        bool isAtomicFloatingPointTy) const;
----------------
No need to pass in isAtomicFloatingPointTy to all these functions. It is just V->getType()->isFloatingPointTy().

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:317

-  const unsigned ValIdx = 0;
-
+  unsigned ValIdx = 0;
   const bool ValDivergent = UA->isDivergentUse(I.getOperandUse(ValIdx));
----------------
Don't need to change this

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:662

-static APInt getIdentityValueForAtomicOp(AtomicRMWInst::BinOp Op,
-                                         unsigned BitWidth) {
+static Constant *getIdentityValueForAtomicOp(LLVMContext &C, Type *const Ty,
+                                             AtomicRMWInst::BinOp Op,
----------------
You can derive C from Ty, and BitWidth from Ty, so the arguments should just be: `AtomicRMWInst::BinOp Op, Type *Ty`

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:843
+      Value *const CtpopFP = B.CreateUIToFP(Ctpop, Ty);
+      NewV = B.CreateFMul(V, CtpopFP);
+      break;
----------------
In general fmul will not give the exact same answer as a sequence of fadds, so you probably need to check some fast math flags before doing this.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156301/new/

https://reviews.llvm.org/D156301