[PATCH] D147408: [AMDGPU] Iterative scan implementation for atomic optimizer.
Pravin Jagtap via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 28 21:44:53 PDT 2023
pravinjagtap added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:664-665
+ if (ValDivergent && ScanImpl == ScanOptions::Iterative) {
+ Compute = BasicBlock::Create(C, "Compute", F);
+ ComputeEnd = BasicBlock::Create(C, "ComputeEnd", F);
+ }
----------------
foad wrote:
> Sink this down to line 700, where you use them?
> Sink this down to line 700, where you use them?
`ComputeEnd` is required at line 766 & 770 after `if ValDivergent` loop. Thats why it is hoisted here.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:740-746
+ if (ValDivergent && ScanImpl == ScanOptions::Iterative) {
+ // Only the first active lane will enter the new control flow to update the
+ // value.
+ CallInst *const FirstActiveLane =
+ B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {}, Mbcnt);
+ Cond = B.CreateICmpEQ(Mbcnt, FirstActiveLane);
+ } else {
----------------
foad wrote:
> I don't think you need to change any of this. The original way of doing the icmp should work in all cases.
Actually No. In the WWM, only the 0th lane (its always the case) will update the final value in a wavefront whereas in the iterative approach `first active lane` will update the final value (first active lane will not be 0th always in iterative approach).
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D147408/new/
https://reviews.llvm.org/D147408
More information about the llvm-commits
mailing list