[PATCH] D96614: [AMDGPU][GlobalISel] Transform (fsub (fmul x, y), z) -> (fma x, y, -z)
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 12 09:39:08 PST 2021
arsenm added inline comments.
================
Comment at: llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:4085-4086
+
+ bool LegalOperations =
+ isLegal({TargetOpcode::G_FADD, {DstType, SrcType}});
+ // Floating-point multiply-add with intermediate rounding.
----------------
This should probably allow vectors we can break down later too
================
Comment at: llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:4088
+ // Floating-point multiply-add with intermediate rounding.
+ bool HasFMAD = (LegalOperations && TLI.isFMADLegal(MI, DstType));
+ // Floating-point multiply-add without intermediate rounding.
----------------
Don't see where isFMADLegal is fedined
================
Comment at: llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:4107-4119
+ unsigned SwapPriority = 0;
+ if (isContractableFMUL(*MI0, AllowFusionGlobally) &&
+ isContractableFMUL(*MI1, AllowFusionGlobally)) {
+ if (std::distance(
+ MRI.use_instr_nodbg_begin(MI0->getOperand(0).getReg()),
+ MRI.use_instr_nodbg_end()) >
+ std::distance(
----------------
I'm not sure I follow this heuristic, or what SwapPriority means
================
Comment at: llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:4160-4161
+
+ LLT SrcTy1 = MRI.getType(Src1);
+ LLT SrcTy3 = MRI.getType(Src3);
+ if (HasFirstFMUL) {
----------------
The types are all identical, there's no reason to query every type
================
Comment at: llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:4164
+ Register NegSrc = MRI.createGenericVirtualRegister(SrcTy3);
+ Builder.buildFNeg(NegSrc, Src3);
+ Src3 = NegSrc;
----------------
You can directly use the type and avoid the explicit createGenericVirtualRegister with
auto Neg = B.buildFNeg(Ty, X)
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-sub-mul.ll:117-118
+; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-CONTRACT-NEXT: v_xor_b32_e32 v2, 0x8000, v2
+; GFX9-CONTRACT-NEXT: v_fma_f16 v0, v0, v1, v2
+; GFX9-CONTRACT-NEXT: s_setpc_b64 s[30:31]
----------------
Why did we fail to fold the modifier here?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D96614/new/
https://reviews.llvm.org/D96614
More information about the llvm-commits
mailing list