[PATCH] D39851: [X86] Add separate intrinsics for scalar FMA4 instructions.

Craig Topper via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Nov 25 09:57:18 PST 2017


craig.topper added inline comments.


================
Comment at: lib/Target/X86/X86Subtarget.h:466
   // has equal or better performance on all supported targets.
-  bool hasFMA() const { return HasFMA && !HasFMA4; }
+  bool hasFMA() const { return HasFMA; }
   bool hasFMA4() const { return HasFMA4; }
----------------
RKSimon wrote:
> This change concerns me - bdver2/bdver3 both support FMA3 as well as FMA4 but via a microcoding hack that costs extra cycles to perform, hence the preference for FMA4.
I'm still giving priority to FMA4 for the generic fma intrinsic and the packed x86 intrinsics, I'm just doing it by including NoFMA4 in the "Requires" line in X86InstrFormats.td now.


https://reviews.llvm.org/D39851





More information about the llvm-commits mailing list