[PATCH] D39851: [X86] Add separate intrinsics for scalar FMA4 instructions.

Thu Nov 9 10:34:51 PST 2017

craig.topper created this revision.

These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits.

I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512.a

I think there are still some missed negate folding and load folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before.

I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics.

fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it.

https://reviews.llvm.org/D39851

Files:
  include/llvm/IR/IntrinsicsX86.td
  lib/Target/X86/X86ISelLowering.cpp
  lib/Target/X86/X86ISelLowering.h
  lib/Target/X86/X86InstrFMA.td
  lib/Target/X86/X86InstrFormats.td
  lib/Target/X86/X86InstrFragmentsSIMD.td
  lib/Target/X86/X86InstrInfo.td
  lib/Target/X86/X86IntrinsicsInfo.h
  lib/Target/X86/X86Subtarget.h
  test/CodeGen/X86/fma-commute-x86.ll
  test/CodeGen/X86/fma-intrinsics-x86.ll
  test/CodeGen/X86/fma-scalar-memfold.ll
  test/CodeGen/X86/fma4-commute-x86.ll
  test/CodeGen/X86/fma4-fneg-combine.ll
  test/CodeGen/X86/fma4-intrinsics-x86.ll
  test/CodeGen/X86/fma4-intrinsics-x86_64-folded-load.ll
  test/CodeGen/X86/fma4-scalar-memfold.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D39851.122269.patch
Type: text/x-patch
Size: 147686 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171109/abb7919a/attachment-0001.bin>