[llvm] [AMDGPU][CodeGen] Fold immediates in src1 operands of V_MAD/MAC/FMA/FMAC. (PR #68002)

Tue Oct 3 07:18:22 PDT 2023

================
@@ -3250,9 +3250,12 @@ bool SIInstrInfo::FoldImmediate(MachineInstr &UseMI, MachineInstr &DefMI,
     MachineOperand *Src2 = getNamedOperand(UseMI, AMDGPU::OpName::src2);
 
     // Multiplied part is the constant: Use v_madmk_{f16, f32}.
-    // We should only expect these to be on src0 due to canonicalization.
----------------
kosarev wrote:

The comment was added long ago in f07833057c01. The tests there don't seem to use any instrinsics, so I guess the comment was referring to fmul/fadd canonicalisation as it was at the time. Matt @arsenm may know better.

The test file, madmk.ll, still exists, but it seems doesn't rely on that custom code anymore.

Canonicalisation does generally make sense to me, and we do canonicalise (fma c, x, y) to (fma x, c, y) in SDAGCombiner, but here we are at a much later stage dealing with concrete legalised instructions, and for V_FMAC_F16/F32 specificaly we have special code in `SIInstrInfo::legalizeOperandsVOP3()` that inserts an SGPR->VGPR COPY for src1. We then fold the immediate operand of the COPY to `V_MOV_B32_e32 <imm>` but do not fold that any further.

https://github.com/llvm/llvm-project/pull/68002