[PATCH] D39851: [X86] Add separate intrinsics for scalar FMA4 instructions.
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Nov 25 09:57:18 PST 2017
craig.topper added inline comments.
================
Comment at: lib/Target/X86/X86Subtarget.h:466
// has equal or better performance on all supported targets.
- bool hasFMA() const { return HasFMA && !HasFMA4; }
+ bool hasFMA() const { return HasFMA; }
bool hasFMA4() const { return HasFMA4; }
----------------
RKSimon wrote:
> This change concerns me - bdver2/bdver3 both support FMA3 as well as FMA4 but via a microcoding hack that costs extra cycles to perform, hence the preference for FMA4.
I'm still giving priority to FMA4 for the generic fma intrinsic and the packed x86 intrinsics, I'm just doing it by including NoFMA4 in the "Requires" line in X86InstrFormats.td now.
https://reviews.llvm.org/D39851
More information about the llvm-commits
mailing list