[PATCH] D156301: [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 23 16:15:26 PDT 2023
arsenm added a comment.
Missing IR check lines? I thought you added some in a previous diff
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:355
+ case AtomicRMWInst::FAdd:
+ return B.CreateBinOp(Instruction::FAdd, LHS, RHS);
case AtomicRMWInst::Sub:
----------------
Can you use B.CreateFAdd instead of the low level CreateBinOp? You'll need that to handle strictfp correctly
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:358-359
return B.CreateBinOp(Instruction::Sub, LHS, RHS);
+ case AtomicRMWInst::FSub:
+ return B.CreateBinOp(Instruction::FSub, LHS, RHS);
case AtomicRMWInst::And:
----------------
Ditto
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:805-807
+ B.CreateUnaryIntrinsic(Intrinsic::ctpop, Ballot), Int32Ty, false);
+ Value *const CtpopFP = B.CreateUIToFP(Ctpop, Ty);
+ NewV = B.CreateFMul(V, CtpopFP);
----------------
We don't have fast math flags on atomics, but you would need to expand to the add sequence without some kind of reassociate flag
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D156301/new/
https://reviews.llvm.org/D156301
More information about the llvm-commits
mailing list