[PATCH] D154858: [AMDGPU] Add llvm.amdgcn.wave.reduce.umin/umax Intrinsic.
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 21 10:13:18 PDT 2023
arsenm added a comment.
Mostly lgtm with a few more cleanups
================
Comment at: llvm/docs/AMDGPUUsage.rst:1014
+ reduction will be performed using default iterative strategy.
+
========================================= ==========================================================
----------------
Probably should mention it's currently only implemented for i32
================
Comment at: llvm/include/llvm/IR/IntrinsicsAMDGPU.td:1931
+class AMDGPUWaveReduce<LLVMType data_ty = llvm_any_ty> : Intrinsic<
+ [data_ty],
----------------
llvm_anyint_ty
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4104
+ bool IsWave32 = ST.isWave32();
+ const TargetRegisterClass *RegClass =
+ IsWave32 ? &AMDGPU::SReg_32RegClass : &AMDGPU::SReg_64RegClass;
----------------
ST.getWaveMaskRegClass
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4127
+ // Branch to ComputeBlock
+ long InitalValue = (Opc == AMDGPU::S_MIN_U32) ? UINT_MAX : 0;
+ auto &TmpSReg =
----------------
uint32_t? std::numeric_limits<uint32_t>::max()?
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4128
+ long InitalValue = (Opc == AMDGPU::S_MIN_U32) ? UINT_MAX : 0;
+ auto &TmpSReg =
+ BuildMI(BB, I, DL, TII->get(MovOpc), LoopIterator).addReg(ExecReg);
----------------
No & on the result of any BuildMI
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4147
+ unsigned SFFOpc = IsWave32 ? AMDGPU::S_FF1_I32_B32 : AMDGPU::S_FF1_I32_B64;
+ auto &FF1 = BuildMI(*ComputeLoop, I, DL, TII->get(SFFOpc), FF1Reg)
+ .addReg(ActiveBits->getOperand(0).getReg());
----------------
No &
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4149
+ .addReg(ActiveBits->getOperand(0).getReg());
+ auto &LaneValue = BuildMI(*ComputeLoop, I, DL,
+ TII->get(AMDGPU::V_READLANE_B32), LaneValueReg)
----------------
No &
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4161
+ IsWave32 ? AMDGPU::S_BITSET0_B32 : AMDGPU::S_BITSET0_B64;
+ auto &NewActiveBits =
+ BuildMI(*ComputeLoop, I, DL, TII->get(BITSETOpc), NewActiveBitsReg)
----------------
No &
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4180
+
+ MRI.replaceRegWith(DstReg, NewAccumulator->getOperand(0).getReg());
+ RetBB = ComputeEnd;
----------------
Could have just use the original register to begin with?
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umax.ll:7
+; RUN: llc -march=amdgcn -mcpu=gfx1010 -global-isel=0 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX10DAGISEL,GFX1064DAGISEL %s
+; RUN: llc -march=amdgcn -mcpu=gfx1010 -global-isel=1 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX10GISEL,GFX1064GISEL %s
+; RUN: llc -march=amdgcn -mcpu=gfx1010 -global-isel=0 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX10DAGISEL,GFX1032DAGISEL %s
----------------
don't specify the wavefrontsize features twice, just use the wave64 override and assume wave32 by default
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umax.ll:313
+entry:
+ %result = call i32 @llvm.amdgcn.wave.reduce.umax.i32(i32 poison, i32 1)
+ store i32 %result, ptr addrspace(1) %out
----------------
In a follow up commit, AMDGPUInstCombineIntrinsic should also fold these constant cases out
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D154858/new/
https://reviews.llvm.org/D154858
More information about the llvm-commits
mailing list