[PATCH] D13710: New X86 FMA3*_Int opcodes for scalar FMA intrinsics.

Mon Nov 2 14:27:58 PST 2015

v_klochkov added a comment.

Hi Elena,

Thank you for reviewing the patch.
I have updated it to add more checks testing the memory folding optimization of FMAs generated for intrinsics.

Also, I am pretty sure that there is no need to use the scheme from X86IntrinsicInfo.h for SCALAR FMA intrinsics.
That approach is usable for packed FMA intrinsics, but for SCALAR FMA intrinscis that would require adding new SDNode operations, the existing FMADD,FMSUB,FNMSUB,FNMADD will not work for SCALAR intrinsics.
There are only few exceptions in X86IntinsicInfo.h where SCALAR intrinsics are handled there, but those are really special operations like VGETMANT intrinsics.

Thank you,
Slava


================
Comment at: llvm/lib/Target/X86/X86InstrInfo.cpp:1737
@@ -1736,2 +1736,3 @@
     { X86::VFMADDSSr231r,         X86::VFMADDSSr231m,         TB_ALIGN_NONE },
+    { X86::VFMADDSSr231r_Int,     X86::VFMADDSSr231m_Int,     TB_ALIGN_NONE },
     { X86::VFMADDSDr231r,         X86::VFMADDSDr231m,         TB_ALIGN_NONE },
----------------
delena wrote:
> Do you have a test that checks memory folding of intrinsic?
There were no any tests checking memory folding of FMA intrinsics.
In the updated patch updated the test fma-intrinsics-x86.ll such a way that it tests Windows target and it also checks Memory Folding.

================
Comment at: llvm/test/CodeGen/X86/fma-intrinsics-phi-213-to-231.ll:171
@@ +170,3 @@
+; CHECK-NEXT: retq
+define <4 x float> @fmaddsubps_loop_128(i32 %iter, <4 x float> %a, <4 x float> %b, <4 x float> %c) {
+entry:
----------------
delena wrote:
> Why do you need so long test in order to check only one operation? The comment is related to all tests.
This LIT test is created not by me, that would be a good question to author, but in my opinion the test is reasonably long.
(I suppose that you say that each of test cases is long, right?)
Test case here represents a very typical situation when there is an FMA instruction in a loop and there is a loop dependency going through the ADD path of FMA (i.e. through the 3rd operand of a*b + c operation).
This test case looks a pretty laconic representation of a loop.
 


http://reviews.llvm.org/D13710