[all-commits] [llvm/llvm-project] 702cf9: [DAGCombiner] allow more folding of fadd + fmul in...
RotateRight via All-commits
all-commits at lists.llvm.org
Tue Jun 9 07:41:53 PDT 2020
Branch: refs/heads/master
Home: https://github.com/llvm/llvm-project
Commit: 702cf933565ea942c5feb7521c89b237f281c4f3
https://github.com/llvm/llvm-project/commit/702cf933565ea942c5feb7521c89b237f281c4f3
Author: Sanjay Patel <spatel at rotateright.com>
Date: 2020-06-09 (Tue, 09 Jun 2020)
Changed paths:
M llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
M llvm/test/CodeGen/AArch64/fadd-combines.ll
M llvm/test/CodeGen/X86/fma_patterns.ll
Log Message:
-----------
[DAGCombiner] allow more folding of fadd + fmul into fma
If fmul and fadd are separated by an fma, we can fold them together
to save an instruction:
fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1))
The fold implemented here is actually a specialization - we should
be able to peek through >1 fma to find this pattern. That's another
patch if we want to try that enhancement though.
This transform was guarded by the TLI hook enableAggressiveFMAFusion(),
so it was done for some in-tree targets like PowerPC, but not AArch64
or x86. The hook is protecting against forming a potentially more
expensive computation when fma takes longer to execute than a single
fadd. That hook may be needed for other transforms, but in this case,
we are replacing fmul+fadd with fma, and the fma should never take
longer than the 2 individual instructions.
'contract' FMF is all we need to allow this transform. That flag
corresponds to -ffp-contract=fast in Clang, so we are allowed to form
fma ops freely across expressions.
Differential Revision: https://reviews.llvm.org/D80801
More information about the All-commits
mailing list