[PATCH] D147408: [AMDGPU] Iterative scan implementation for atomic optimizer.

Tue May 30 03:04:42 PDT 2023

arsenm requested changes to this revision.
arsenm added a comment.
This revision now requires changes to proceed.

Should have some checks on the IR output

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:46
+  ScanOptions ScanImpl;
+  AMDGPUAtomicOptimizer(ScanOptions ScanImpl)
+      : FunctionPass(ID), ScanImpl(ScanImpl) {}
----------------
Should probably add header comment explaining the different strategies 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:495
+
+  Type *const Ty = I.getType();
+  unsigned WaveFrontSize = ST->isWave32() ? 32 : 64;
----------------
east const throughout 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:496
+  Type *const Ty = I.getType();
+  unsigned WaveFrontSize = ST->isWave32() ? 32 : 64;
+  Type *const WaveTy = B.getIntNTy(WaveFrontSize);
----------------
ST->getWavefrontSize()

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:515
+    OldValuePhi = B.CreatePHI(Ty, 2, "old_value_phi");
+    OldValuePhi->addIncoming(UndefValue::get(Ty), EntryBB);
+  }
----------------
Use poison for missing values 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:544
+  // return the next active lane
+  Value *Mask = B.CreateShl(B.getIntN(WaveFrontSize, 1), FF1);
+  Value *InverseMask = B.CreateXor(Mask, B.getIntN(WaveFrontSize, -1));
----------------
You already have WaveTy above 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:545
+  Value *Mask = B.CreateShl(B.getIntN(WaveFrontSize, 1), FF1);
+  Value *InverseMask = B.CreateXor(Mask, B.getIntN(WaveFrontSize, -1));
+  Value *NewActiveBits = B.CreateAnd(ActiveBits, InverseMask);
----------------
You already have WaveTy above 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:550
+  // Branch out of the loop when all lanes are processed.
+  Value *IsEnd = B.CreateICmpEQ(NewActiveBits, B.getIntN(WaveFrontSize, 0));
+  B.CreateCondBr(IsEnd, ComputeEnd, ComputeLoop);
----------------
You already have WaveTy above 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:657
+  Function *F = I.getFunction();
+  LLVMContext &C = I.getParent()->getContext();
+  BasicBlock *ComputeLoop = nullptr;
----------------
F->getContext()

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:693
+      // Alternative implementation for scan
+      ComputeLoop = BasicBlock::Create(C, "ComputeLoop", F);
+      ComputeEnd = BasicBlock::Create(C, "ComputeEnd", F);
----------------
Use consistent naming style for the blocks? other places used snake case 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:277-285
+static cl::opt<bool> EnableAtomicOptimizationsUsingDPP(
+    "amdgpu-atomic-optimizer-use-dpp",
+    cl::desc("Use DPP in the atomic optimizer"), cl::init(true), cl::Hidden);
+
 // Enable atomic optimization
-static cl::opt<bool> EnableAtomicOptimizations(
-  "amdgpu-atomic-optimizations",
-  cl::desc("Enable atomic optimizations"),
-  cl::init(false),
-  cl::Hidden);
+static cl::opt<bool>
+    EnableAtomicOptimizations("amdgpu-atomic-optimizations",
----------------
I would expect this to be one cl::enum flag for the optimizer strategy. It would also still be better if this was a parsable pass parameter 

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_optimizations_mul_one.ll:3
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: opt -S -mtriple=amdgcn-- -amdgpu-atomic-optimizer -verify-machineinstrs %s | FileCheck -check-prefix=IR %s
+; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-atomic-optimizer -verify-machineinstrs %s | FileCheck -check-prefix=IR %s
 ; RUN: llc -global-isel -mtriple=amdgcn-- -amdgpu-atomic-optimizations -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s
----------------
cdevadas wrote:
> -verify-machineinstrs won't do anything here.
-verify-machineinstrs should be removed 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147408/new/

https://reviews.llvm.org/D147408