[PATCH] D154858: [WIP] [AMDGPU] Add llvm.amdgcn.reduce.add Intrinsic.
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jul 10 09:19:43 PDT 2023
arsenm added a comment.
I was thinking an IR expansion would be easier, but it's good to have a machine one (at least for umin)
================
Comment at: llvm/include/llvm/IR/IntrinsicsAMDGPU.td:1929
+def int_amdgcn_reduce_add :
+ Intrinsic<[llvm_i32_ty], [llvm_i32_ty],
+ [IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>;
----------------
Should have a mangled type. I also think it should have an immarg operand for the preferred lowering strategy to use. Also, wave_reduce?
Also umin would be a better choice for a first one, given that we want it for dynamic alloca handling
================
Comment at: llvm/lib/Target/AMDGPU/SILowerReduceAndScanPseudo.cpp:1
+
+#include "AMDGPU.h"
----------------
Missing header
================
Comment at: llvm/lib/Target/AMDGPU/SILowerReduceAndScanPseudo.cpp:87
+ bool IsWave32 = ST->isWave32();
+ const TargetRegisterClass *RegClass =
+ IsWave32 ? &AMDGPU::SReg_32RegClass : &AMDGPU::SReg_64RegClass;
----------------
There's supposed to be a getWaveRegClass to go through
================
Comment at: llvm/lib/Target/AMDGPU/SILowerReduceAndScanPseudo.cpp:172
+
+bool SIExpandReduceAndScanPseudo::runOnMachineFunction(MachineFunction &MF) {
+
----------------
This doesn't need to be a separate pass, can be a post isel hook
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgpu.reduce.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
+; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=+wavefrontsize32,-wavefrontsize64 < %s | FileCheck %s
+
----------------
test with -global-isel=1/0
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgpu.reduce.ll:19
+entry:
+ %result = call i32 @llvm.amdgcn.reduce.add(i32 %in)
+ store i32 %result, ptr addrspace(1) %out
----------------
Also add poison and constant tests
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgpu.reduce.ll:25
+
+define amdgpu_kernel void @divergent_value(ptr addrspace(1) %out, ptr addrspace(1) %val) #0 {
+; CHECK-LABEL: divergent_value:
----------------
Also add a test where this is under divergent control flow
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgpu.reduce.ll:52
+ %out.kernarg.offset = getelementptr inbounds i8, ptr addrspace(4) %divergent_value.kernarg.segment, i64 36
+ %loaded.out.kernarg.offset = load <2 x i64>, ptr addrspace(4) %out.kernarg.offset, align 4
+ %out.load1 = extractelement <2 x i64> %loaded.out.kernarg.offset, i32 0
----------------
Should strip out most of this test
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgpu.reduce.ll:73
+
+attributes #0 = {"target-cpu"="gfx1100"}
+attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none)}
----------------
Drop this, it's redundant with the run line target and breaks adding multiple run targets
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D154858/new/
https://reviews.llvm.org/D154858
More information about the llvm-commits
mailing list