[PATCH] D39851: [X86] Add separate intrinsics for scalar FMA4 instructions.
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 9 10:34:51 PST 2017
craig.topper created this revision.
These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits.
I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512.a
I think there are still some missed negate folding and load folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before.
I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics.
fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it.
https://reviews.llvm.org/D39851
Files:
include/llvm/IR/IntrinsicsX86.td
lib/Target/X86/X86ISelLowering.cpp
lib/Target/X86/X86ISelLowering.h
lib/Target/X86/X86InstrFMA.td
lib/Target/X86/X86InstrFormats.td
lib/Target/X86/X86InstrFragmentsSIMD.td
lib/Target/X86/X86InstrInfo.td
lib/Target/X86/X86IntrinsicsInfo.h
lib/Target/X86/X86Subtarget.h
test/CodeGen/X86/fma-commute-x86.ll
test/CodeGen/X86/fma-intrinsics-x86.ll
test/CodeGen/X86/fma-scalar-memfold.ll
test/CodeGen/X86/fma4-commute-x86.ll
test/CodeGen/X86/fma4-fneg-combine.ll
test/CodeGen/X86/fma4-intrinsics-x86.ll
test/CodeGen/X86/fma4-intrinsics-x86_64-folded-load.ll
test/CodeGen/X86/fma4-scalar-memfold.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D39851.122269.patch
Type: text/x-patch
Size: 147686 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171109/abb7919a/attachment-0001.bin>
More information about the llvm-commits
mailing list