[PATCH] D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer.
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Sep 13 05:10:48 PDT 2018
arsenm added a comment.
Should include tests running only the IR pass
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:57
+
+ StringRef getPassName() const override { return "AMDGPU Atomic Optimizer"; }
+
----------------
This can be dropped
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:60-61
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ AU.addRequired<LegacyDivergenceAnalysis>();
+ AU.addRequired<TargetPassConfig>();
+ }
----------------
This needs work, but should preserve the dominator tree
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:176
+ if (ValDivergent &&
+ (!HasDPP || (I.getType()->getPrimitiveSizeInBits() != 32))) {
+ return;
----------------
No getPrimitiveSizeInBits
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:207
+ // \------------------> exit
+ BasicBlock *const EntryBB = I.getParent();
+ BasicBlock *const ExitBB = EntryBB->splitBasicBlock(&I);
----------------
All the consts are in the wrong place
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:212-213
+
+ // Delete the terminator of EntryBB as we are going to replace it.
+ EntryBB->getTerminator()->eraseFromParent();
+
----------------
I'm pretty sure there's a utility function that deals with what you're trying to do, that will also handle the DominatorTree updates for this
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:219
+ Type *const Ty = I.getType();
+ const unsigned TyBitWidth = Ty->getPrimitiveSizeInBits();
+ Type *const VecTy = VectorType::get(B.getInt32Ty(), 2);
----------------
You pretty much should never use getPrimitiveSizeInBits directly. Use the DataLayout
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:229-234
+ Function *const ReadRegisterIntrinsic = Intrinsic::getDeclaration(
+ I.getModule(), Intrinsic::read_register, B.getInt64Ty());
+ MDNode *const RegName =
+ llvm::MDNode::get(Context, llvm::MDString::get(Context, "exec"));
+ Value *const Metadata = llvm::MetadataAsValue::get(Context, RegName);
+ Value *const Exec = B.CreateCall(ReadRegisterIntrinsic, Metadata);
----------------
You need to mark this call site as convergent
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:243-244
+ Value *const ExtractHi = B.CreateExtractElement(BitCast, B.getInt32(1));
+ Function *const MbcntLoIntrinsic =
+ Intrinsic::getDeclaration(I.getModule(), Intrinsic::amdgcn_mbcnt_lo);
+ Function *const MbcntHiIntrinsic =
----------------
CreateIntrinsic
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:294-295
+ LaneOffset =
+ B.CreateCall(MovDPPIntrinsic, {NewV, B.getInt32(0x138), B.getInt32(0xf),
+ B.getInt32(0xf), B.getTrue()});
+ LaneOffset = B.CreateCall(WWMIntrinsic, LaneOffset);
----------------
Can you use enums to derive these values
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:367-378
+ Value *const BitCast = B.CreateBitCast(PHI, VecTy);
+ Value *const ExtractLo = B.CreateExtractElement(BitCast, B.getInt32(0));
+ Value *const ExtractHi = B.CreateExtractElement(BitCast, B.getInt32(1));
+ Value *const ReadFirstLaneLo =
+ B.CreateCall(ReadFirstLaneIntrinsic, ExtractLo);
+ Value *const ReadFirstLaneHi =
+ B.CreateCall(ReadFirstLaneIntrinsic, ExtractHi);
----------------
The canonical way to do this in IR is lshr x, 32 and truncate
https://reviews.llvm.org/D51969
More information about the llvm-commits
mailing list