[PATCH] D154858: [AMDGPU] Add llvm.amdgcn.wave.reduce.umin/umax Intrinsic.

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 21 10:13:18 PDT 2023


arsenm added a comment.

Mostly lgtm with a few more cleanups



================
Comment at: llvm/docs/AMDGPUUsage.rst:1014
+                                             reduction will be performed using default iterative strategy.
+
   =========================================  ==========================================================
----------------
Probably should mention it's currently only implemented for i32


================
Comment at: llvm/include/llvm/IR/IntrinsicsAMDGPU.td:1931
 
+class AMDGPUWaveReduce<LLVMType data_ty = llvm_any_ty> : Intrinsic<
+    [data_ty], 
----------------
llvm_anyint_ty


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4104
+    bool IsWave32 = ST.isWave32();
+    const TargetRegisterClass *RegClass =
+        IsWave32 ? &AMDGPU::SReg_32RegClass : &AMDGPU::SReg_64RegClass;
----------------
ST.getWaveMaskRegClass


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4127
+    // Branch to ComputeBlock
+    long InitalValue = (Opc == AMDGPU::S_MIN_U32) ? UINT_MAX : 0;
+    auto &TmpSReg =
----------------
uint32_t? std::numeric_limits<uint32_t>::max()?


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4128
+    long InitalValue = (Opc == AMDGPU::S_MIN_U32) ? UINT_MAX : 0;
+    auto &TmpSReg =
+        BuildMI(BB, I, DL, TII->get(MovOpc), LoopIterator).addReg(ExecReg);
----------------
No & on the result of any BuildMI


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4147
+    unsigned SFFOpc = IsWave32 ? AMDGPU::S_FF1_I32_B32 : AMDGPU::S_FF1_I32_B64;
+    auto &FF1 = BuildMI(*ComputeLoop, I, DL, TII->get(SFFOpc), FF1Reg)
+                    .addReg(ActiveBits->getOperand(0).getReg());
----------------
No &


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4149
+                    .addReg(ActiveBits->getOperand(0).getReg());
+    auto &LaneValue = BuildMI(*ComputeLoop, I, DL,
+                              TII->get(AMDGPU::V_READLANE_B32), LaneValueReg)
----------------
No &


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4161
+        IsWave32 ? AMDGPU::S_BITSET0_B32 : AMDGPU::S_BITSET0_B64;
+    auto &NewActiveBits =
+        BuildMI(*ComputeLoop, I, DL, TII->get(BITSETOpc), NewActiveBitsReg)
----------------
No &


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4180
+
+    MRI.replaceRegWith(DstReg, NewAccumulator->getOperand(0).getReg());
+    RetBB = ComputeEnd;
----------------
Could have just use the original register to begin with?


================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umax.ll:7
+; RUN: llc -march=amdgcn -mcpu=gfx1010 -global-isel=0 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX10DAGISEL,GFX1064DAGISEL %s
+; RUN: llc -march=amdgcn -mcpu=gfx1010 -global-isel=1 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX10GISEL,GFX1064GISEL %s
+; RUN: llc -march=amdgcn -mcpu=gfx1010 -global-isel=0 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX10DAGISEL,GFX1032DAGISEL %s
----------------
don't specify the wavefrontsize features twice, just use the wave64 override and assume wave32 by default


================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umax.ll:313
+entry:
+    %result = call i32 @llvm.amdgcn.wave.reduce.umax.i32(i32 poison, i32 1)
+    store i32 %result, ptr addrspace(1) %out
----------------
In a follow up commit, AMDGPUInstCombineIntrinsic should also fold these constant cases out


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154858/new/

https://reviews.llvm.org/D154858



More information about the llvm-commits mailing list