[PATCH] D83981: [x86] split FMA with fast-math-flags to avoid libcall
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 16 12:56:01 PDT 2020
spatel created this revision.
spatel added reviewers: craig.topper, cameron.mcinally, RKSimon.
Herald added subscribers: steven.zhang, hiraditya, mcrosier.
Herald added a project: LLVM.
fma reassoc A, B, C --> fadd (fmul A, B), C (when target has no FMA hardware)
C/C++ code may use explicit fma() calls (which become LLVM fma intrinsics in IR) but then gets compiled with -ffast-math or similar.
For targets that do not have FMA hardware, we don't want to go out to the math library for a precise but slow FMA result.
I tried this as a generic DAGCombine, but it caused infinite looping on more than 1 other target, so there's likely some over-reaching fma formation happening.
There's also a potential intersection of strict FP with fast-math here. I'm not sure who should win that fight, so just deferring to current behavior for that case.
https://reviews.llvm.org/D83981
Files:
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/fma.ll
Index: llvm/test/CodeGen/X86/fma.ll
===================================================================
--- llvm/test/CodeGen/X86/fma.ll
+++ llvm/test/CodeGen/X86/fma.ll
@@ -73,9 +73,15 @@
;
; FMACALL32-LABEL: test_f32_reassoc:
; FMACALL32: ## %bb.0:
-; FMACALL32-NEXT: jmp _fmaf ## TAILCALL
-; FMACALL32-NEXT: ## encoding: [0xeb,A]
-; FMACALL32-NEXT: ## fixup A - offset: 1, value: _fmaf-1, kind: FK_PCRel_1
+; FMACALL32-NEXT: pushl %eax ## encoding: [0x50]
+; FMACALL32-NEXT: vmovss {{[0-9]+}}(%esp), %xmm0 ## encoding: [0xc5,0xfa,0x10,0x44,0x24,0x08]
+; FMACALL32-NEXT: ## xmm0 = mem[0],zero,zero,zero
+; FMACALL32-NEXT: vmulss {{[0-9]+}}(%esp), %xmm0, %xmm0 ## encoding: [0xc5,0xfa,0x59,0x44,0x24,0x0c]
+; FMACALL32-NEXT: vaddss {{[0-9]+}}(%esp), %xmm0, %xmm0 ## encoding: [0xc5,0xfa,0x58,0x44,0x24,0x10]
+; FMACALL32-NEXT: vmovss %xmm0, (%esp) ## encoding: [0xc5,0xfa,0x11,0x04,0x24]
+; FMACALL32-NEXT: flds (%esp) ## encoding: [0xd9,0x04,0x24]
+; FMACALL32-NEXT: popl %eax ## encoding: [0x58]
+; FMACALL32-NEXT: retl ## encoding: [0xc3]
;
; FMA64-LABEL: test_f32_reassoc:
; FMA64: ## %bb.0:
@@ -85,9 +91,9 @@
;
; FMACALL64-LABEL: test_f32_reassoc:
; FMACALL64: ## %bb.0:
-; FMACALL64-NEXT: jmp _fmaf ## TAILCALL
-; FMACALL64-NEXT: ## encoding: [0xeb,A]
-; FMACALL64-NEXT: ## fixup A - offset: 1, value: _fmaf-1, kind: FK_PCRel_1
+; FMACALL64-NEXT: mulss %xmm1, %xmm0 ## encoding: [0xf3,0x0f,0x59,0xc1]
+; FMACALL64-NEXT: addss %xmm2, %xmm0 ## encoding: [0xf3,0x0f,0x58,0xc2]
+; FMACALL64-NEXT: retq ## encoding: [0xc3]
;
; AVX512-LABEL: test_f32_reassoc:
; AVX512: ## %bb.0:
Index: llvm/lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- llvm/lib/Target/X86/X86ISelLowering.cpp
+++ llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -46129,14 +46129,23 @@
if (!TLI.isTypeLegal(VT))
return SDValue();
- EVT ScalarVT = VT.getScalarType();
- if ((ScalarVT != MVT::f32 && ScalarVT != MVT::f64) || !Subtarget.hasAnyFMA())
- return SDValue();
-
SDValue A = N->getOperand(IsStrict ? 1 : 0);
SDValue B = N->getOperand(IsStrict ? 2 : 1);
SDValue C = N->getOperand(IsStrict ? 3 : 2);
+ // If the operation allows fast-math and the target does not support FMA,
+ // split this into mul+add to avoid a libcall.
+ SDNodeFlags Flags = N->getFlags();
+ if (!IsStrict && Flags.hasAllowReassociation() &&
+ TLI.isOperationExpand(ISD::FMA, VT)) {
+ SDValue Fmul = DAG.getNode(ISD::FMUL, dl, VT, A, B, Flags);
+ return DAG.getNode(ISD::FADD, dl, VT, Fmul, C, Flags);
+ }
+
+ EVT ScalarVT = VT.getScalarType();
+ if ((ScalarVT != MVT::f32 && ScalarVT != MVT::f64) || !Subtarget.hasAnyFMA())
+ return SDValue();
+
auto invertIfNegative = [&DAG, &TLI, &DCI](SDValue &V) {
bool CodeSize = DAG.getMachineFunction().getFunction().hasOptSize();
bool LegalOperations = !DCI.isBeforeLegalizeOps();
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D83981.278579.patch
Type: text/x-patch
Size: 2974 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200716/4247bb4f/attachment.bin>
More information about the llvm-commits
mailing list