[PATCH] D96614: [AMDGPU][GlobalISel] Transform (fsub (fmul x, y), z) -> (fma x, y, -z)

Fri Feb 12 09:39:08 PST 2021

arsenm added inline comments.

================
Comment at: llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:4085-4086
+
+  bool LegalOperations =
+      isLegal({TargetOpcode::G_FADD, {DstType, SrcType}});
+  // Floating-point multiply-add with intermediate rounding.
----------------
This should probably allow vectors we can break down later too

================
Comment at: llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:4088
+  // Floating-point multiply-add with intermediate rounding.
+  bool HasFMAD = (LegalOperations && TLI.isFMADLegal(MI, DstType));
+  // Floating-point multiply-add without intermediate rounding.
----------------
Don't see where isFMADLegal is fedined

================
Comment at: llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:4107-4119
+  unsigned SwapPriority = 0;
+  if (isContractableFMUL(*MI0, AllowFusionGlobally) &&
+      isContractableFMUL(*MI1, AllowFusionGlobally)) {
+    if (std::distance(
+          MRI.use_instr_nodbg_begin(MI0->getOperand(0).getReg()),
+          MRI.use_instr_nodbg_end()) >
+        std::distance(
----------------
I'm not sure I follow this heuristic, or what SwapPriority means

================
Comment at: llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:4160-4161
+
+  LLT SrcTy1 = MRI.getType(Src1);
+  LLT SrcTy3 = MRI.getType(Src3);
+  if (HasFirstFMUL) {
----------------
The types are all identical, there's no reason to query every type

================
Comment at: llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:4164
+    Register NegSrc = MRI.createGenericVirtualRegister(SrcTy3);
+    Builder.buildFNeg(NegSrc, Src3);
+    Src3 = NegSrc;
----------------
You can directly use the type and avoid the explicit createGenericVirtualRegister with
auto Neg = B.buildFNeg(Ty, X)

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-sub-mul.ll:117-118
+; GFX9-CONTRACT-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-CONTRACT-NEXT:    v_xor_b32_e32 v2, 0x8000, v2
+; GFX9-CONTRACT-NEXT:    v_fma_f16 v0, v0, v1, v2
+; GFX9-CONTRACT-NEXT:    s_setpc_b64 s[30:31]
----------------
Why did we fail to fold the modifier here?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96614/new/

https://reviews.llvm.org/D96614