[PATCH] D15741: [X86] Avoid folding scalar loads into unary sse intrinsics

Thu Dec 31 01:24:08 PST 2015

mkuper added a comment.

Thanks, Sanjay.

> How about adding some 'FIXME' notes and/or changing the other defs since we're currently inconsistent about this? LGTM otherwise.

> 

>   float f1(int *x) { return *x; } 

>   double f2(int *x) { return *x; }

>   float f3(long long *x) { return *x; }

>   double f4(long long *x) { return *x; }

>   float f5(double *x) { return *x; }

>   double f6(float *x) { return *x; }

I'll add FIXMEs.

> Regarding handling this via ExeDepsFix - it's not clear to me that its current solution:

> 

>   xorps %xmm0, %xmm0

>   cvtsi2ssl (%rdi) %xmm0

>    

> 

> would be better than unfolding the load. I think the xorps instruction saves a byte in all cases, but it may be micro-arch-dependent whether that's actually cheaper?

I think it generally is better (the xorps idiom should be recognized by any modern Intel CPU, at least). But David is the real authority on this.

http://reviews.llvm.org/D15741