[PATCH] D154858: [WIP] [AMDGPU] Add llvm.amdgcn.wave.reduce.umin/umax Intrinsic.

Wed Jul 12 07:51:04 PDT 2023

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4089
+        ScanStratgyImm == 0 ? ScanOptions::DPP : ScanOptions::Iterative;
+    if (ScanStrategy == ScanOptions::Iterative) {
+      // To reduce the VGPR, we need to iterative over all the active lanes.
----------------
I was envisioning this as just a hint, and if unimplemented (or the target doesn't support the version), it would just fallback to one that works.

Should also add some intrinsic documentation to AMDGPUUsage with the values for this

================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:267
+
+  def WAVE_REDUCE_UMAX_PSEUDO : VPseudoInstSI <(outs SGPR_32:$sdst),
+    (ins VSrc_b32: $src, VSrc_b32:$strategy),
----------------
These need _U32/_B32 suffixes

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umax.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
+; RUN: llc -march=amdgcn -mcpu=gfx1100 -global-isel=0 -mattr=+wavefrontsize32,-wavefrontsize64 < %s | FileCheck %s
+
----------------
Should test with both wave sizes, and test for every generation, with global-isel=0 and 1

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umin.ll:4
+
+declare i32 @llvm.amdgcn.wave.reduce.umin(i32, i32)
+declare i32 @llvm.amdgcn.workitem.id.x()
----------------
Put the immarg on the declarations 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154858/new/

https://reviews.llvm.org/D154858