[PATCH] D15741: [X86] Avoid folding scalar loads into unary sse intrinsics

Michael Kuperstein via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 31 01:24:08 PST 2015


mkuper added a comment.

Thanks, Sanjay.

> How about adding some 'FIXME' notes and/or changing the other defs since we're currently inconsistent about this? LGTM otherwise.

> 

>   float f1(int *x) { return *x; } 

>   double f2(int *x) { return *x; }

>   float f3(long long *x) { return *x; }

>   double f4(long long *x) { return *x; }

>   float f5(double *x) { return *x; }

>   double f6(float *x) { return *x; }


I'll add FIXMEs.

> Regarding handling this via ExeDepsFix - it's not clear to me that its current solution:

> 

>   xorps %xmm0, %xmm0

>   cvtsi2ssl (%rdi) %xmm0

>    

> 

> would be better than unfolding the load. I think the xorps instruction saves a byte in all cases, but it may be micro-arch-dependent whether that's actually cheaper?


I think it generally is better (the xorps idiom should be recognized by any modern Intel CPU, at least). But David is the real authority on this.


http://reviews.llvm.org/D15741





More information about the llvm-commits mailing list