[PATCH] D14901: [X86][SSE] Improve i16 splatting shuffles

Wed Dec 16 12:06:45 PST 2015

RKSimon added inline comments.

================
Comment at: test/CodeGen/X86/avx-splat.ll:18
@@ -18,1 +17,3 @@
+; CHECK-NEXT:    vpshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,5,5,5]
+; CHECK-NEXT:    vpshufd {{.*#+}} xmm0 = xmm0[2,2,3,3]
 ; CHECK-NEXT:    vinsertf128 $1, %xmm0, %ymm0, %ymm0
----------------
qcolombet wrote:
> I am missing something.
> 
> You said:
> > Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it.
> 
> But isn’t pshufb always better than two other pshufX instructions?
On all but the newest machines (Haswell+) PSHUFB has a 3 or more cy latency (which is where the 3 op threshold comes from). The bigger problem is that PSHUFB nearly always has to load the shuffle mask operand from the constant pool, which is a lot more costly than 3 immediate ops.


Repository:
  rL LLVM

http://reviews.llvm.org/D14901