[PATCH] D154858: [WIP] [AMDGPU] Add llvm.amdgcn.wave.reduce.umin/umax Intrinsic.
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 12 07:51:04 PDT 2023
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4089
+ ScanStratgyImm == 0 ? ScanOptions::DPP : ScanOptions::Iterative;
+ if (ScanStrategy == ScanOptions::Iterative) {
+ // To reduce the VGPR, we need to iterative over all the active lanes.
----------------
I was envisioning this as just a hint, and if unimplemented (or the target doesn't support the version), it would just fallback to one that works.
Should also add some intrinsic documentation to AMDGPUUsage with the values for this
================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:267
+
+ def WAVE_REDUCE_UMAX_PSEUDO : VPseudoInstSI <(outs SGPR_32:$sdst),
+ (ins VSrc_b32: $src, VSrc_b32:$strategy),
----------------
These need _U32/_B32 suffixes
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umax.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
+; RUN: llc -march=amdgcn -mcpu=gfx1100 -global-isel=0 -mattr=+wavefrontsize32,-wavefrontsize64 < %s | FileCheck %s
+
----------------
Should test with both wave sizes, and test for every generation, with global-isel=0 and 1
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umin.ll:4
+
+declare i32 @llvm.amdgcn.wave.reduce.umin(i32, i32)
+declare i32 @llvm.amdgcn.workitem.id.x()
----------------
Put the immarg on the declarations
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D154858/new/
https://reviews.llvm.org/D154858
More information about the llvm-commits
mailing list