[PATCH] D27692: [x86] use a single shufps when it can save instructions
Michael Kuperstein via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 13 11:06:50 PST 2016
mkuper added a subscriber: wmi.
mkuper added a comment.
Just wanted to point out the other direction for this also exists.
@wmi ran into this:
#include <pmmintrin.h>
__m128 c, d, e;
void foo(__m128 a, __m128 b) {
e = a;
a = _mm_shuffle_ps(a, a, 0x0);
c = _mm_mul_ps(e, b);
d = _mm_add_ps(a, b);
}
We generate:
movaps %xmm0, e(%rip)
movaps %xmm0, %xmm2
shufps $0, %xmm2, %xmm2 # xmm2 = xmm2[0,0,0,0]
mulps %xmm1, %xmm0
movaps %xmm0, c(%rip)
addps %xmm1, %xmm2
movaps %xmm2, d(%rip)
retq
Because we don't even try to match a pshufd in the float domain, even though we could do something like:
movaps %xmm0, e(%rip)
pshufd $0, %xmm0, %xmm2 # xmm2 = xmm0[0,0,0,0]
mulps %xmm1, %xmm0
movaps %xmm0, c(%rip)
addps %xmm1, %xmm2
movaps %xmm2, d(%rip)
retq
https://reviews.llvm.org/D27692
More information about the llvm-commits
mailing list