[PATCH] D15741: [X86] Avoid folding scalar loads into unary sse intrinsics
Michael Kuperstein via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 29 23:55:49 PST 2015
mkuper added a comment.
In http://reviews.llvm.org/D15741#317755, @spatel wrote:
> > This is consistent with the patterns we already have for the fp/int converts...
>
>
> We still need to fix converts?
>
> #include <xmmintrin.h>
> __m128 foo(__m128 x, int *y) { return _mm_cvtsi32_ss(x, *y); }
>
>
> $ ./clang -O1 ss2si.c -S -o -
>
> cvtsi2ssl (%rdi), %xmm1 <--- false dependency on xmm1?
> movss %xmm1, %xmm0
Right, I was talking about this:
def CVTSD2SSrm : I<0x5A, MRMSrcMem, (outs FR32:$dst), (ins f64mem:$src),
"cvtsd2ss\t{$src, $dst|$dst, $src}",
[(set FR32:$dst, (fround (loadf64 addr:$src)))],
IIC_SSE_CVT_Scalar_RM>,
XD,
Requires<[UseSSE2, OptForSize]>, Sched<[WriteCvtF2FLd]>;
But this is actually the non-intrinsic pattern.
================
Comment at: lib/Target/X86/X86InstrSSE.td:3392
@@ +3391,3 @@
+ // We don't want to fold scalar loads into these instructions unless optimizing
+ // for size. This is because the folded instruction will have a partial register
+ // update, while the unfolded sequence will not, e.g.
----------------
spatel wrote:
> 80-cols.
The TDs don't enforce 80-cols consistently, and I never remember whether they should. Thanks. :-)
http://reviews.llvm.org/D15741
More information about the llvm-commits
mailing list