[PATCH] D83981: [x86] split FMA with fast-math-flags to avoid libcall

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 16 12:56:01 PDT 2020


spatel created this revision.
spatel added reviewers: craig.topper, cameron.mcinally, RKSimon.
Herald added subscribers: steven.zhang, hiraditya, mcrosier.
Herald added a project: LLVM.

fma reassoc A, B, C --> fadd (fmul A, B), C (when target has no FMA hardware)

C/C++ code may use explicit fma() calls (which become LLVM fma intrinsics in IR) but then gets compiled with -ffast-math or similar. 
For targets that do not have FMA hardware, we don't want to go out to the math library for a precise but slow FMA result.

I tried this as a generic DAGCombine, but it caused infinite looping on more than 1 other target, so there's likely some over-reaching fma formation happening.

There's also a potential intersection of strict FP with fast-math here. I'm not sure who should win that fight, so just deferring to current behavior for that case.


https://reviews.llvm.org/D83981

Files:
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/test/CodeGen/X86/fma.ll


Index: llvm/test/CodeGen/X86/fma.ll
===================================================================
--- llvm/test/CodeGen/X86/fma.ll
+++ llvm/test/CodeGen/X86/fma.ll
@@ -73,9 +73,15 @@
 ;
 ; FMACALL32-LABEL: test_f32_reassoc:
 ; FMACALL32:       ## %bb.0:
-; FMACALL32-NEXT:    jmp _fmaf ## TAILCALL
-; FMACALL32-NEXT:    ## encoding: [0xeb,A]
-; FMACALL32-NEXT:    ## fixup A - offset: 1, value: _fmaf-1, kind: FK_PCRel_1
+; FMACALL32-NEXT:    pushl %eax ## encoding: [0x50]
+; FMACALL32-NEXT:    vmovss {{[0-9]+}}(%esp), %xmm0 ## encoding: [0xc5,0xfa,0x10,0x44,0x24,0x08]
+; FMACALL32-NEXT:    ## xmm0 = mem[0],zero,zero,zero
+; FMACALL32-NEXT:    vmulss {{[0-9]+}}(%esp), %xmm0, %xmm0 ## encoding: [0xc5,0xfa,0x59,0x44,0x24,0x0c]
+; FMACALL32-NEXT:    vaddss {{[0-9]+}}(%esp), %xmm0, %xmm0 ## encoding: [0xc5,0xfa,0x58,0x44,0x24,0x10]
+; FMACALL32-NEXT:    vmovss %xmm0, (%esp) ## encoding: [0xc5,0xfa,0x11,0x04,0x24]
+; FMACALL32-NEXT:    flds (%esp) ## encoding: [0xd9,0x04,0x24]
+; FMACALL32-NEXT:    popl %eax ## encoding: [0x58]
+; FMACALL32-NEXT:    retl ## encoding: [0xc3]
 ;
 ; FMA64-LABEL: test_f32_reassoc:
 ; FMA64:       ## %bb.0:
@@ -85,9 +91,9 @@
 ;
 ; FMACALL64-LABEL: test_f32_reassoc:
 ; FMACALL64:       ## %bb.0:
-; FMACALL64-NEXT:    jmp _fmaf ## TAILCALL
-; FMACALL64-NEXT:    ## encoding: [0xeb,A]
-; FMACALL64-NEXT:    ## fixup A - offset: 1, value: _fmaf-1, kind: FK_PCRel_1
+; FMACALL64-NEXT:    mulss %xmm1, %xmm0 ## encoding: [0xf3,0x0f,0x59,0xc1]
+; FMACALL64-NEXT:    addss %xmm2, %xmm0 ## encoding: [0xf3,0x0f,0x58,0xc2]
+; FMACALL64-NEXT:    retq ## encoding: [0xc3]
 ;
 ; AVX512-LABEL: test_f32_reassoc:
 ; AVX512:       ## %bb.0:
Index: llvm/lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- llvm/lib/Target/X86/X86ISelLowering.cpp
+++ llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -46129,14 +46129,23 @@
   if (!TLI.isTypeLegal(VT))
     return SDValue();
 
-  EVT ScalarVT = VT.getScalarType();
-  if ((ScalarVT != MVT::f32 && ScalarVT != MVT::f64) || !Subtarget.hasAnyFMA())
-    return SDValue();
-
   SDValue A = N->getOperand(IsStrict ? 1 : 0);
   SDValue B = N->getOperand(IsStrict ? 2 : 1);
   SDValue C = N->getOperand(IsStrict ? 3 : 2);
 
+  // If the operation allows fast-math and the target does not support FMA,
+  // split this into mul+add to avoid a libcall.
+  SDNodeFlags Flags = N->getFlags();
+  if (!IsStrict && Flags.hasAllowReassociation() &&
+      TLI.isOperationExpand(ISD::FMA, VT)) {
+    SDValue Fmul = DAG.getNode(ISD::FMUL, dl, VT, A, B, Flags);
+    return DAG.getNode(ISD::FADD, dl, VT, Fmul, C, Flags);
+  }
+
+  EVT ScalarVT = VT.getScalarType();
+  if ((ScalarVT != MVT::f32 && ScalarVT != MVT::f64) || !Subtarget.hasAnyFMA())
+    return SDValue();
+
   auto invertIfNegative = [&DAG, &TLI, &DCI](SDValue &V) {
     bool CodeSize = DAG.getMachineFunction().getFunction().hasOptSize();
     bool LegalOperations = !DCI.isBeforeLegalizeOps();


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D83981.278579.patch
Type: text/x-patch
Size: 2974 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200716/4247bb4f/attachment.bin>


More information about the llvm-commits mailing list