[llvm] [AMDGPU] Prefer v_madak_f32 over v_madmk_f32 to reduce vgpr pressure (PR #72506)

Thu Nov 16 04:21:50 PST 2023

================
@@ -3454,6 +3454,19 @@ bool SIInstrInfo::FoldImmediate(MachineInstr &UseMI, MachineInstr &DefMI,
       if (!Src2->isReg() || RI.isSGPRClass(MRI->getRegClass(Src2->getReg())))
         return false;
 
+      // If src2 is also a literal constant then we have to choose which one to
+      // fold. In general it is better to choose madak so that the other literal
+      // can be materialized in an sgpr instead of a vgpr:
+      //   s_mov_b32 s0, literal
+      //   v_madak_f32 v0, s0, v0, literal
+      // Instead of:
+      //   v_mov_b32 v1, literal
+      //   v_madmk_f32 v0, v0, literal, v1
+      MachineInstr *Def = MRI->getUniqueVRegDef(Src2->getReg());
+      if (Def && Def->isMoveImmediate() &&
+          !isInlineConstant(Def->getOperand(1)))
+        return false;
+
----------------
dstutt wrote:

Is the way this works that it detects the other case (a second immediate in src2) and relies on another FoldImmediiate call for that operand - which will result in the fmaak instead?

https://github.com/llvm/llvm-project/pull/72506