[PATCH] D15741: [X86] Avoid folding scalar loads into unary sse intrinsics
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 30 08:39:47 PST 2015
spatel accepted this revision.
spatel added a comment.
This revision is now accepted and ready to land.
In http://reviews.llvm.org/D15741#317959, @mkuper wrote:
> def CVTSD2SSrm : I<0x5A, MRMSrcMem, (outs FR32:$dst), (ins f64mem:$src),
> "cvtsd2ss\t{$src, $dst|$dst, $src}",
> [(set FR32:$dst, (fround (loadf64 addr:$src)))],
> IIC_SSE_CVT_Scalar_RM>,
> XD,
> Requires<[UseSSE2, OptForSize]>, Sched<[WriteCvtF2FLd]>;
Ah, I managed to miss that one.
How about adding some 'FIXME' notes and/or changing the other defs since we're currently inconsistent about this? LGTM otherwise.
float f1(int *x) { return *x; }
double f2(int *x) { return *x; }
float f3(long long *x) { return *x; }
double f4(long long *x) { return *x; }
float f5(double *x) { return *x; }
double f6(float *x) { return *x; }
$ ./clang -O1 ss2si.c -S -o - |grep cvt
cvtsi2ssl (%rdi), %xmm0
cvtsi2sdl (%rdi), %xmm0
cvtsi2ssq (%rdi), %xmm0
cvtsi2sdq (%rdi), %xmm0
cvtsd2ss %xmm0, %xmm0
cvtss2sd %xmm0, %xmm0
Regarding handling this via ExeDepsFix - it's not clear to me that its current solution:
xorps %xmm0, %xmm0
cvtsi2ssl (%rdi) %xmm0
would be better than unfolding the load. I think the xorps instruction saves a byte in all cases, but it may be micro-arch-dependent whether that's actually cheaper?
================
Comment at: lib/Target/X86/X86InstrSSE.td:3392
@@ +3391,3 @@
+ // We don't want to fold scalar loads into these instructions unless optimizing
+ // for size. This is because the folded instruction will have a partial register
+ // update, while the unfolded sequence will not, e.g.
----------------
mkuper wrote:
> spatel wrote:
> > 80-cols.
> The TDs don't enforce 80-cols consistently, and I never remember whether they should. Thanks. :-)
It seems like we mostly try to follow the law, but I would fully support a new rule for these files.
80-cols causes a lot of extra suffering trying to make sense of this code that's already hard to read. :)
http://reviews.llvm.org/D15741
More information about the llvm-commits
mailing list