[PATCH] D15741: [X86] Avoid folding scalar loads into unary sse intrinsics
Michael Kuperstein via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 31 01:24:08 PST 2015
mkuper added a comment.
Thanks, Sanjay.
> How about adding some 'FIXME' notes and/or changing the other defs since we're currently inconsistent about this? LGTM otherwise.
>
> float f1(int *x) { return *x; }
> double f2(int *x) { return *x; }
> float f3(long long *x) { return *x; }
> double f4(long long *x) { return *x; }
> float f5(double *x) { return *x; }
> double f6(float *x) { return *x; }
I'll add FIXMEs.
> Regarding handling this via ExeDepsFix - it's not clear to me that its current solution:
>
> xorps %xmm0, %xmm0
> cvtsi2ssl (%rdi) %xmm0
>
>
> would be better than unfolding the load. I think the xorps instruction saves a byte in all cases, but it may be micro-arch-dependent whether that's actually cheaper?
I think it generally is better (the xorps idiom should be recognized by any modern Intel CPU, at least). But David is the real authority on this.
http://reviews.llvm.org/D15741
More information about the llvm-commits
mailing list