[PATCH] D15741: [X86] Avoid folding scalar loads into unary sse intrinsics

Michael Kuperstein via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 31 01:24:08 PST 2015

mkuper added a comment.

Thanks, Sanjay.

> How about adding some 'FIXME' notes and/or changing the other defs since we're currently inconsistent about this? LGTM otherwise.


>   float f1(int *x) { return *x; } 

>   double f2(int *x) { return *x; }

>   float f3(long long *x) { return *x; }

>   double f4(long long *x) { return *x; }

>   float f5(double *x) { return *x; }

>   double f6(float *x) { return *x; }

I'll add FIXMEs.

> Regarding handling this via ExeDepsFix - it's not clear to me that its current solution:


>   xorps %xmm0, %xmm0

>   cvtsi2ssl (%rdi) %xmm0



> would be better than unfolding the load. I think the xorps instruction saves a byte in all cases, but it may be micro-arch-dependent whether that's actually cheaper?

I think it generally is better (the xorps idiom should be recognized by any modern Intel CPU, at least). But David is the real authority on this.


More information about the llvm-commits mailing list