[llvm] [AMDGPU] Prefer v_madak_f32 over v_madmk_f32 to reduce vgpr pressure (PR #72506)
David Stuttard via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 16 04:21:50 PST 2023
================
@@ -3454,6 +3454,19 @@ bool SIInstrInfo::FoldImmediate(MachineInstr &UseMI, MachineInstr &DefMI,
if (!Src2->isReg() || RI.isSGPRClass(MRI->getRegClass(Src2->getReg())))
return false;
+ // If src2 is also a literal constant then we have to choose which one to
+ // fold. In general it is better to choose madak so that the other literal
+ // can be materialized in an sgpr instead of a vgpr:
+ // s_mov_b32 s0, literal
+ // v_madak_f32 v0, s0, v0, literal
+ // Instead of:
+ // v_mov_b32 v1, literal
+ // v_madmk_f32 v0, v0, literal, v1
+ MachineInstr *Def = MRI->getUniqueVRegDef(Src2->getReg());
+ if (Def && Def->isMoveImmediate() &&
+ !isInlineConstant(Def->getOperand(1)))
+ return false;
+
----------------
dstutt wrote:
Is the way this works that it detects the other case (a second immediate in src2) and relies on another FoldImmediiate call for that operand - which will result in the fmaak instead?
https://github.com/llvm/llvm-project/pull/72506
More information about the llvm-commits
mailing list