[PATCH] D38318: [X86][SSE] Match PSHUFLW/PSHUFHW + PSHUFD vXi16 shuffle patterns (PR34686)

Wed Sep 27 10:49:50 PDT 2017

delena added inline comments.

================
Comment at: test/CodeGen/X86/vector-shuffle-128-v8.ll:1976
+; AVX2:       # BB#0:
+; AVX2-NEXT:    vpshuflw {{.*#+}} xmm0 = xmm0[0,1,1,0,4,5,6,7]
+; AVX2-NEXT:    vpbroadcastq %xmm0, %xmm0
----------------
RKSimon wrote:
> zvi wrote:
> > Looks like AVX2, AVX512 regressed. Any idea what happened?
> We went under the 3-op threshold for combining unary shuffles to PSHUFB (where before it was the PSHUFD+PSHUFLW+PSHUFHW code from SSE2). Despite being 2 ops, this is much smaller in codesize due to not requiring a constant pool entry. It also makes folding easier.
Loading a constant form memory may be done outside the loop. And two shuffles instead of one increase shuffle port pressure. 
I think that the original "pshufb" is better in this case.

Repository:
  rL LLVM

https://reviews.llvm.org/D38318