[PATCH] D158468: [AMDGPU] Support sdot4 / sdot8 intrinsics on gfx11

Tue Aug 22 12:05:02 PDT 2023

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/VOP3PInstructions.td:437-452
 defm V_DOT4_I32_IU8 : VOP3PDOTIUInst<"v_dot4_i32_iu8", int_amdgcn_sudot4>;
 defm V_DOT8_I32_IU4 : VOP3PDOTIUInst<"v_dot8_i32_iu4", int_amdgcn_sudot8>;
+
+def : GCNPat < (int_amdgcn_sdot8 i32:$src0,
+                                 i32:$src1,
+                                 i32:$src2, (i1 timm:$clamp)),
+               (V_DOT8_I32_IU4  (i32 8), i32:$src0,
----------------
jrbyrnes wrote:
> arsenm wrote:
> > I don't understand how these cases are different, the intrinsic name is just slightly different from the instruction name?
> On all other targets with 8bit and 4bit signed dot, we codegen for int_amdgcn_sdot4 and int_amdgcn_sdot8. However, we don't support these on gfx1100 -- instead, gfx100 has int_amdgcn_sUdot4 / int_amdgcn_sUdot8. The result is that users of these intrinsics must always check the target to use the corresponding one (sudot4 for gfx1100, and sdot4 for all others). 
> 
> This removes that responsibility from the user, so they are able to use sdot4 across all targets and generate the corresponding instructions.
> 
Are there unit tests for these somewhere? I don't really know the full history of these instructions and I'm worried there was some random edge case behavior change

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158468/new/

https://reviews.llvm.org/D158468