[PATCH] D15741: [X86] Avoid folding scalar loads into unary sse intrinsics

Tue Dec 29 23:55:49 PST 2015

mkuper added a comment.

In http://reviews.llvm.org/D15741#317755, @spatel wrote:

> > This is consistent with the patterns we already have for the fp/int converts...
>
>
> We still need to fix converts?
>
>   #include <xmmintrin.h>
>   __m128 foo(__m128 x, int *y) { return _mm_cvtsi32_ss(x, *y); }
>   
>
> $ ./clang -O1 ss2si.c -S -o -
>
>   cvtsi2ssl  (%rdi), %xmm1  <--- false dependency on xmm1?
>   movss      %xmm1, %xmm0         

Right, I was talking about this:

  def CVTSD2SSrm  : I<0x5A, MRMSrcMem, (outs FR32:$dst), (ins f64mem:$src),
                        "cvtsd2ss\t{$src, $dst|$dst, $src}",
                        [(set FR32:$dst, (fround (loadf64 addr:$src)))],
                        IIC_SSE_CVT_Scalar_RM>,
                        XD,
                    Requires<[UseSSE2, OptForSize]>, Sched<[WriteCvtF2FLd]>;

But this is actually the non-intrinsic pattern.

================
Comment at: lib/Target/X86/X86InstrSSE.td:3392
@@ +3391,3 @@
+  // We don't want to fold scalar loads into these instructions unless optimizing
+  // for size. This is because the folded instruction will have a partial register
+  // update, while the unfolded sequence will not, e.g.
----------------
spatel wrote:
> 80-cols.
The TDs don't enforce 80-cols consistently, and I never remember whether they should. Thanks. :-)

http://reviews.llvm.org/D15741