[PATCH] D158059: [AMDGPU/wmma] - Disable 3-address syntax for f16

Jessica Del via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 16 02:14:43 PDT 2023


OutOfCache created this revision.
Herald added subscribers: foad, kerbowa, hiraditya, tpr, dstuttard, yaxunl, jvesely, kzhuravl, arsenm.
Herald added a project: All.
OutOfCache requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
Herald added a project: LLVM.

Always keep wmma instructions with a 16-bit floating-point accumulator
as two-address instruction.

This is a prerequesite for an upcoming optimization for wmma with 16-bit
accumulator matrices.
We want to pack the results of two separate
`wmma`s into the same register, so one matrix
is in the lower half while the other matrix is
in the upper half of the registers.

We pack the values into the registers before using them
in the first `wmma` as input:

  v_wmma_f16_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7]
  v_wmma_f16_16x16x16_f16 v[0:7], v[24:31], v[35:42], v[0:7] op_sel:[0,0,1]

Therefore, both instructions need to write to the same registers
and overwrite the values of the input matrices.

We have verified the correct behavior by
running nod.ai's Stable Diffusion with these
changes in data layout.
On average, this change reduced the vgpr count by 17.17% (in 88 shaders
that the change applied to).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D158059

Files:
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td


Index: llvm/lib/Target/AMDGPU/VOP3PInstructions.td
===================================================================
--- llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+++ llvm/lib/Target/AMDGPU/VOP3PInstructions.td
@@ -855,11 +855,12 @@
 
   defvar WMMAConstraints2Addr = "@earlyclobber $vdst,$vdst = $src2";
   defvar WMMAConstraints3Addr = "@earlyclobber $vdst";
+  defvar isConvertableTo3Addr = !cond(!eq(Instr, "v_wmma_f16_16x16x16_f16"): 0, true: 1);
 
   defvar WMMAProfile = VOPProfileWMMA<P, Suffix, _Src01RC64, Type.hasClamp, Type.hasOpsel>;
   if !eq(Suffix, "_w32") then {
     let Mnemonic = Instr, mayRaiseFPException = 0, ReadsModeReg = 0 in {
-      let Constraints = WMMAConstraints2Addr, isConvertibleToThreeAddress = 1 in {
+      let Constraints = WMMAConstraints2Addr, isConvertibleToThreeAddress = isConvertableTo3Addr in {
         def _twoaddr_w32 : VOP3P_Pseudo<Instr # Suffix, WMMAProfile>;
       }
       let Constraints = WMMAConstraints3Addr, SchedRW = [Write32Bit, Write32Bit] in {
@@ -870,7 +871,7 @@
                             !cast<Instruction>(NAME # _threeaddr_w32)>;
   } else if !eq(Suffix, "_w64") then {
     let Mnemonic = Instr, mayRaiseFPException = 0, ReadsModeReg = 0 in {
-      let Constraints = WMMAConstraints2Addr, isConvertibleToThreeAddress = 1 in {
+      let Constraints = WMMAConstraints2Addr, isConvertibleToThreeAddress = isConvertableTo3Addr in {
         def _twoaddr_w64 : VOP3P_Pseudo<Instr # Suffix, WMMAProfile>;
       }
       let Constraints = WMMAConstraints3Addr, SchedRW = [Write32Bit, Write32Bit] in {


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D158059.550670.patch
Type: text/x-patch
Size: 1573 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230816/0551aec4/attachment.bin>


More information about the llvm-commits mailing list