[PATCH] D147408: [AMDGPU] Iterative scan implementation for atomic optimizer.
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue May 30 03:04:42 PDT 2023
arsenm requested changes to this revision.
arsenm added a comment.
This revision now requires changes to proceed.
Should have some checks on the IR output
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:46
+ ScanOptions ScanImpl;
+ AMDGPUAtomicOptimizer(ScanOptions ScanImpl)
+ : FunctionPass(ID), ScanImpl(ScanImpl) {}
----------------
Should probably add header comment explaining the different strategies
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:495
+
+ Type *const Ty = I.getType();
+ unsigned WaveFrontSize = ST->isWave32() ? 32 : 64;
----------------
east const throughout
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:496
+ Type *const Ty = I.getType();
+ unsigned WaveFrontSize = ST->isWave32() ? 32 : 64;
+ Type *const WaveTy = B.getIntNTy(WaveFrontSize);
----------------
ST->getWavefrontSize()
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:515
+ OldValuePhi = B.CreatePHI(Ty, 2, "old_value_phi");
+ OldValuePhi->addIncoming(UndefValue::get(Ty), EntryBB);
+ }
----------------
Use poison for missing values
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:544
+ // return the next active lane
+ Value *Mask = B.CreateShl(B.getIntN(WaveFrontSize, 1), FF1);
+ Value *InverseMask = B.CreateXor(Mask, B.getIntN(WaveFrontSize, -1));
----------------
You already have WaveTy above
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:545
+ Value *Mask = B.CreateShl(B.getIntN(WaveFrontSize, 1), FF1);
+ Value *InverseMask = B.CreateXor(Mask, B.getIntN(WaveFrontSize, -1));
+ Value *NewActiveBits = B.CreateAnd(ActiveBits, InverseMask);
----------------
You already have WaveTy above
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:550
+ // Branch out of the loop when all lanes are processed.
+ Value *IsEnd = B.CreateICmpEQ(NewActiveBits, B.getIntN(WaveFrontSize, 0));
+ B.CreateCondBr(IsEnd, ComputeEnd, ComputeLoop);
----------------
You already have WaveTy above
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:657
+ Function *F = I.getFunction();
+ LLVMContext &C = I.getParent()->getContext();
+ BasicBlock *ComputeLoop = nullptr;
----------------
F->getContext()
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:693
+ // Alternative implementation for scan
+ ComputeLoop = BasicBlock::Create(C, "ComputeLoop", F);
+ ComputeEnd = BasicBlock::Create(C, "ComputeEnd", F);
----------------
Use consistent naming style for the blocks? other places used snake case
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:277-285
+static cl::opt<bool> EnableAtomicOptimizationsUsingDPP(
+ "amdgpu-atomic-optimizer-use-dpp",
+ cl::desc("Use DPP in the atomic optimizer"), cl::init(true), cl::Hidden);
+
// Enable atomic optimization
-static cl::opt<bool> EnableAtomicOptimizations(
- "amdgpu-atomic-optimizations",
- cl::desc("Enable atomic optimizations"),
- cl::init(false),
- cl::Hidden);
+static cl::opt<bool>
+ EnableAtomicOptimizations("amdgpu-atomic-optimizations",
----------------
I would expect this to be one cl::enum flag for the optimizer strategy. It would also still be better if this was a parsable pass parameter
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_optimizations_mul_one.ll:3
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: opt -S -mtriple=amdgcn-- -amdgpu-atomic-optimizer -verify-machineinstrs %s | FileCheck -check-prefix=IR %s
+; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-atomic-optimizer -verify-machineinstrs %s | FileCheck -check-prefix=IR %s
; RUN: llc -global-isel -mtriple=amdgcn-- -amdgpu-atomic-optimizations -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s
----------------
cdevadas wrote:
> -verify-machineinstrs won't do anything here.
-verify-machineinstrs should be removed
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D147408/new/
https://reviews.llvm.org/D147408
More information about the llvm-commits
mailing list