[PATCH] D27692: [x86] use a single shufps when it can save instructions

Michael Kuperstein via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 13 11:06:50 PST 2016


mkuper added a subscriber: wmi.
mkuper added a comment.

Just wanted to point out the other direction for this also exists.

@wmi ran into this:

  #include <pmmintrin.h>
  
  __m128 c, d, e;
  
  void foo(__m128 a, __m128 b) {
    e = a;
    a = _mm_shuffle_ps(a, a, 0x0);
    c = _mm_mul_ps(e, b);
    d = _mm_add_ps(a, b);
  }

We generate:

  movaps  %xmm0, e(%rip)
  movaps  %xmm0, %xmm2
  shufps  $0, %xmm2, %xmm2        # xmm2 = xmm2[0,0,0,0]
  mulps   %xmm1, %xmm0
  movaps  %xmm0, c(%rip)
  addps   %xmm1, %xmm2
  movaps  %xmm2, d(%rip)
  retq

Because we don't even try to match a pshufd in the float domain, even though we could do something like:

  movaps  %xmm0, e(%rip)
  pshufd  $0, %xmm0, %xmm2        # xmm2 = xmm0[0,0,0,0]
  mulps   %xmm1, %xmm0
  movaps  %xmm0, c(%rip)
  addps   %xmm1, %xmm2
  movaps  %xmm2, d(%rip)
  retq


https://reviews.llvm.org/D27692





More information about the llvm-commits mailing list