[llvm] [AMDGPU] Prefer v_madak_f32 over v_madmk_f32 to reduce vgpr pressure (PR #72506)

Thu Nov 16 04:47:21 PST 2023

================
@@ -3454,6 +3454,19 @@ bool SIInstrInfo::FoldImmediate(MachineInstr &UseMI, MachineInstr &DefMI,
       if (!Src2->isReg() || RI.isSGPRClass(MRI->getRegClass(Src2->getReg())))
         return false;
 
+      // If src2 is also a literal constant then we have to choose which one to
+      // fold. In general it is better to choose madak so that the other literal
+      // can be materialized in an sgpr instead of a vgpr:
+      //   s_mov_b32 s0, literal
+      //   v_madak_f32 v0, s0, v0, literal
+      // Instead of:
+      //   v_mov_b32 v1, literal
+      //   v_madmk_f32 v0, v0, literal, v1
+      MachineInstr *Def = MRI->getUniqueVRegDef(Src2->getReg());
+      if (Def && Def->isMoveImmediate() &&
+          !isInlineConstant(Def->getOperand(1)))
+        return false;
+
----------------
jayfoad wrote:

Yeah, sorry if that wasn't clear. It relies on the fact that there will (hopefully!) be a later call to foldImmediate, to fold the other literal into the same MAD instruction.

There's also a 50% chance that the literals would have been folded in the other order, in which case we would have folded to MADAK straight away and this patch would have no effect.

https://github.com/llvm/llvm-project/pull/72506