[PATCH] D156301: [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 29 16:12:53 PDT 2023
arsenm accepted this revision.
arsenm added inline comments.
This revision is now accepted and ready to land.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:805-807
+ B.CreateUnaryIntrinsic(Intrinsic::ctpop, Ballot), Int32Ty, false);
+ Value *const CtpopFP = B.CreateUIToFP(Ctpop, Ty);
+ NewV = B.CreateFMul(V, CtpopFP);
----------------
pravinjagtap wrote:
> pravinjagtap wrote:
> > arsenm wrote:
> > > We don't have fast math flags on atomics, but you would need to expand to the add sequence without some kind of reassociate flag
> > >
> > >
> > If the logic of `no-of-active-lanes * uniform float value` is not valid here for uniform value case, then can we use the logic implemented in `buildScanIteratively` for divergent values (even if the input value is uniform in atomics).
> >
> > Or, we want sequence of additions avoiding the loop (branch instructions) that we have in `buildScanIteratively`. We also need to write back this intermediate values of sequence of additions if results is needed later in the kernel.
> CC: @b-sumner @foad
I suppose this is fine. You didn't have any adding order guarantee before
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D156301/new/
https://reviews.llvm.org/D156301
More information about the llvm-commits
mailing list