[PATCH] D147408: [AMDGPU] Iterative scan implementation for atomic optimizer.
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 28 05:35:07 PDT 2023
foad added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPU.h:89
+enum class ScanOptions : bool { DPP, Iterative };
+FunctionPass *createAMDGPUAtomicOptimizerPass(bool IsUseDpp);
void initializeAMDGPUAtomicOptimizerPass(PassRegistry &);
----------------
Just "UseDpp"?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPU.h:245
+
+public:
+ ScanOptions ScanImpl;
----------------
No reason for this to be public.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:501-502
+ B.CreateIntrinsic(Intrinsic::amdgcn_ballot, WaveTy, B.getTrue());
+ Type *const BallotTy = Ballot->getType();
+ const unsigned TyBitWidth = DL->getTypeSizeInBits(BallotTy);
+
----------------
Don't need these. They will always be the same as WaveTy and WaveFrontSize.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:515
+ OldValuePhi = B.CreatePHI(Ty, 2, "old_value_phi");
+ OldValuePhi->addIncoming(V, EntryBB);
+ }
----------------
I think you can use an undef value here instead of V.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:521-522
+ // Call llvm.cttz instrinsic to find the lowest remaining active lane.
+ Function *CttzDecl = Intrinsic::getDeclaration(M, Intrinsic::cttz, WaveTy);
+ Value *FF1 = B.CreateCall(CttzDecl, {ActiveBits, B.getTrue()});
+ Value *LaneIdxInt = B.CreateTrunc(FF1, Ty);
----------------
Use B.CreateIntrinsic, then you don't need M.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:529
+
+ // Perfrom writelane if intermidiate scan results are required later in the
+ // kernel computations
----------------
Typo "perform", "intermediate"
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:533-536
+ Function *WriteLaneDecl =
+ Intrinsic::getDeclaration(M, Intrinsic::amdgcn_writelane, {});
+ OldValue =
+ B.CreateCall(WriteLaneDecl, {Accumulator, LaneIdxInt, OldValuePhi});
----------------
Use B.CreateIntrinsic
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:553
+ Value *IsEnd = B.CreateICmpEQ(NewActiveBits, B.getIntN(TyBitWidth, 0));
+ B.CreateCondBr(IsEnd, ComputeEnd, Compute);
+
----------------
Maybe change "Compute" to "ComputeLoop" everywhere, since it is the body of the loop?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:658
+ llvm::Function *F = I.getParent()->getParent();
+ LLVMContext &C = I.getParent()->getContext();
----------------
Remove `llvm::`, use `getFunction`
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:664-665
+ if (ValDivergent && ScanImpl == ScanOptions::Iterative) {
+ Compute = BasicBlock::Create(C, "Compute", F);
+ ComputeEnd = BasicBlock::Create(C, "ComputeEnd", F);
+ }
----------------
Sink this down to line 700, where you use them?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:740-746
+ if (ValDivergent && ScanImpl == ScanOptions::Iterative) {
+ // Only the first active lane will enter the new control flow to update the
+ // value.
+ CallInst *const FirstActiveLane =
+ B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {}, Mbcnt);
+ Cond = B.CreateICmpEQ(Mbcnt, FirstActiveLane);
+ } else {
----------------
I don't think you need to change any of this. The original way of doing the icmp should work in all cases.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:767
+ B.SetInsertPoint(ComputeEnd);
+ Instruction *NewTerminator = Terminator->clone();
+ Terminator->eraseFromParent();
----------------
Why do you need to clone? Could you just do Terminator->removeFromParent then B.Insert(Terminator)?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:272-275
+// Enable atomic optimization with DPP
+static cl::opt<bool> EnableAtomicOptimizationsUsingDPP(
+ "amdgpu-atomic-optimizer-use-dpp",
+ cl::desc("Enable atomic optimizations using DPP"), cl::init(true),
----------------
Description seems a bit misleading to me since this option doesn't enable the whole atomic optimizer pass. I would suggest changing it to "Enable use of DPP in the atomic optimizer" or just "Use DPP in the atomic optimizer"
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D147408/new/
https://reviews.llvm.org/D147408
More information about the llvm-commits
mailing list