[PATCH] D13710: New X86 FMA3*_Int opcodes for scalar FMA intrinsics.

Wed Oct 28 09:46:16 PDT 2015

Hi Elena,

> On Oct 28, 2015, at 6:43 AM, Elena Demikhovsky <elena.demikhovsky at intel.com> wrote:
> 
> delena added a subscriber: delena.
> delena added a comment.
> 
> Is it possible to convert intrinsic to FMA node in DAG lowering phase, like we did in X86IntrinsicInfo.h?

I think the point of the patch is to keep those intrinsic around to emphasize that the user wants a different semantic than that instructions that are generated by the compiler. I.e., when the intrinsic is used we want to copy the high bits of the first source into the destination, whereas for the plain instruction, we do not have this constraint.
I admit that this sounds like a workaround to a bug in the model of the instruction, but current we have no good way to model that properly.

Cheers,
Q.

> 
> 
> ================
> Comment at: llvm/lib/Target/X86/X86InstrInfo.cpp:1737
> @@ -1736,2 +1736,3 @@
>     { X86::VFMADDSSr231r,         X86::VFMADDSSr231m,         TB_ALIGN_NONE },
> +    { X86::VFMADDSSr231r_Int,     X86::VFMADDSSr231m_Int,     TB_ALIGN_NONE },
>     { X86::VFMADDSDr231r,         X86::VFMADDSDr231m,         TB_ALIGN_NONE },
> ----------------
> Do you have a test that checks memory folding of intrinsic?
> 
> ================
> Comment at: llvm/test/CodeGen/X86/fma-intrinsics-phi-213-to-231.ll:171
> @@ +170,3 @@
> +; CHECK-NEXT: retq
> +define <4 x float> @fmaddsubps_loop_128(i32 %iter, <4 x float> %a, <4 x float> %b, <4 x float> %c) {
> +entry:
> ----------------
> Why do you need so long test in order to check only one operation? The comment is related to all tests.
> 
> 
> http://reviews.llvm.org/D13710
> 
> 
>