[PATCH] D158468: [AMDGPU] Support sdot4 / sdot8 intrinsics on gfx11

Tue Aug 22 11:23:48 PDT 2023

jrbyrnes added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/VOP3PInstructions.td:437-452
 defm V_DOT4_I32_IU8 : VOP3PDOTIUInst<"v_dot4_i32_iu8", int_amdgcn_sudot4>;
 defm V_DOT8_I32_IU4 : VOP3PDOTIUInst<"v_dot8_i32_iu4", int_amdgcn_sudot8>;
+
+def : GCNPat < (int_amdgcn_sdot8 i32:$src0,
+                                 i32:$src1,
+                                 i32:$src2, (i1 timm:$clamp)),
+               (V_DOT8_I32_IU4  (i32 8), i32:$src0,
----------------
arsenm wrote:
> I don't understand how these cases are different, the intrinsic name is just slightly different from the instruction name?
On all other targets with 8bit and 4bit signed dot, we codegen for int_amdgcn_sdot4 and int_amdgcn_sdot8. However, we don't support these on gfx1100 -- instead, gfx100 has int_amdgcn_sUdot4 / int_amdgcn_sUdot8. The result is that users of these intrinsics must always check the target to use the corresponding one (sudot4 for gfx1100, and sdot4 for all others). 

This removes that responsibility from the user, so they are able to use sdot4 across all targets and generate the corresponding instructions.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158468/new/

https://reviews.llvm.org/D158468