[PATCH] D13710: New X86 FMA3*_Int opcodes for scalar FMA intrinsics.
Vyacheslav Klochkov via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 3 11:48:55 PST 2015
v_klochkov added a comment.
Elena,
Please see the answers to your questions.
Thank you,
Slava
================
Comment at: llvm/lib/Target/X86/X86InstrInfo.cpp:1815
@@ -1796,2 +1814,3 @@
{ X86::VFNMSUBSSr231r, X86::VFNMSUBSSr231m, TB_ALIGN_NONE },
+ { X86::VFNMSUBSSr231r_Int, X86::VFNMSUBSSr231m_Int, TB_ALIGN_NONE },
{ X86::VFNMSUBSDr231r, X86::VFNMSUBSDr231m, TB_ALIGN_NONE },
----------------
delena wrote:
> I don't understand how you can use the 231 form for scalar intrinsic:
>
> intr_fmadd_ss( a, b, c) may be translated as
>
> VFMADD213SS a, b, c
> or
> VFMADD132SS a, c, b
>
> but you can't generate VFMADD231SS because "a" should go first, you are taking the upper part from it.
Very good question. In the file X86InstrFMA.td I intentionally added a comment noticing that problem.
Please see the line 215 in that file:
// The FMA 231 form can be get only by commuting the 1st operand of 213 or 231
// forms and is possible only after special analysis of all uses of the initial
// instruction. Such analysis do not exist yet and thus introducing the 231
// form of FMA*_Int instructions is done using an optimistic assumption that
// such analysis will be implemented eventually.
BTW, I noticed a misprint in that comment and I'll fix it: "213 or 231" --> "213 or 132".
If ONLY the lowest element of FMA213 result is used then it is possible to commute the 1st operand.
Such analysis exist and used in other compilers.
================
Comment at: llvm/test/CodeGen/X86/fma-intrinsics-phi-213-to-231.ll:171
@@ +170,3 @@
+; CHECK-NEXT: retq
+define <4 x float> @fmaddsubps_loop_128(i32 %iter, <4 x float> %a, <4 x float> %b, <4 x float> %c) {
+entry:
----------------
delena wrote:
> The test checks that FMA intrinsic gives the right form of FMA instruction.
> I don't understand why do you need a loop here. We wrote a lot of FMA intrinsic tests without any loops.
The loop is needed to get the right form of FMA instruction, i.e. the 231 form is generated when there is a LOOP DEPENDENCY on the ADD path. The test checks that 231 form is generated for such loops.
================
Comment at: llvm/test/CodeGen/X86/fma-intrinsics-x86.ll:485
@@ +484,3 @@
+; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx|rdx)}}), %xmm{{0|1}}
+; CHECK-FMA-WIN-NEXT: vfnmsub213sd (%r8), %xmm1, %xmm0
+;
----------------
delena wrote:
> you check folding vector load into scalar intrinsic.
> On AVX-512 we support folding scalar load to scalar intrinsic., by matching scalar_to_vector(loadf32) pattern in td file
I agree, the check tests memory folding of vector load into scalar intrinsic.
Memory folding does not work for such test cases (with and without my patch):
__m128d m = _mm_load_sd(mem);
__m128d res = _mm_fmadd_sd(a, b, m);
This should be fixed, and I think I know how to easily do that, but I would rather do that in a separate patch.
http://reviews.llvm.org/D13710
More information about the llvm-commits
mailing list