[PATCH] D13710: New X86 FMA3*_Int opcodes for scalar FMA intrinsics.

Tue Nov 3 07:02:04 PST 2015

delena added inline comments.

================
Comment at: llvm/lib/Target/X86/X86InstrInfo.cpp:1815
@@ -1796,2 +1814,3 @@
     { X86::VFNMSUBSSr231r,        X86::VFNMSUBSSr231m,        TB_ALIGN_NONE },
+    { X86::VFNMSUBSSr231r_Int,    X86::VFNMSUBSSr231m_Int,    TB_ALIGN_NONE },
     { X86::VFNMSUBSDr231r,        X86::VFNMSUBSDr231m,        TB_ALIGN_NONE },
----------------
I don't understand how you can use the 231 form for scalar intrinsic:

intr_fmadd_ss( a, b, c) may be translated as

VFMADD213SS a, b, c
or
VFMADD132SS a, c, b

but you can't generate VFMADD231SS because "a" should go first, you are taking the upper part from it.

================
Comment at: llvm/test/CodeGen/X86/fma-intrinsics-phi-213-to-231.ll:171
@@ +170,3 @@
+; CHECK-NEXT: retq
+define <4 x float> @fmaddsubps_loop_128(i32 %iter, <4 x float> %a, <4 x float> %b, <4 x float> %c) {
+entry:
----------------
The test checks that  FMA intrinsic gives the right form of FMA instruction.
I don't understand why do you need a loop here. We wrote a lot of FMA intrinsic tests without any loops.

================
Comment at: llvm/test/CodeGen/X86/fma-intrinsics-x86.ll:485
@@ +484,3 @@
+; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx|rdx)}}), %xmm{{0|1}}
+; CHECK-FMA-WIN-NEXT: vfnmsub213sd (%r8), %xmm1, %xmm0
+;
----------------
you check folding vector load into scalar intrinsic.
On AVX-512 we support folding scalar load to scalar intrinsic., by matching scalar_to_vector(loadf32) pattern in td file


http://reviews.llvm.org/D13710