[PATCH] D38318: [X86][SSE] Match PSHUFLW/PSHUFHW + PSHUFD vXi16 shuffle patterns (PR34686)

Wed Sep 27 12:58:10 PDT 2017

RKSimon added inline comments.

================
Comment at: test/CodeGen/X86/vector-shuffle-128-v8.ll:1976
+; AVX2:       # BB#0:
+; AVX2-NEXT:    vpshuflw {{.*#+}} xmm0 = xmm0[0,1,1,0,4,5,6,7]
+; AVX2-NEXT:    vpbroadcastq %xmm0, %xmm0
----------------
delena wrote:
> RKSimon wrote:
> > zvi wrote:
> > > Looks like AVX2, AVX512 regressed. Any idea what happened?
> > We went under the 3-op threshold for combining unary shuffles to PSHUFB (where before it was the PSHUFD+PSHUFLW+PSHUFHW code from SSE2). Despite being 2 ops, this is much smaller in codesize due to not requiring a constant pool entry. It also makes folding easier.
> Loading a constant form memory may be done outside the loop. And two shuffles instead of one increase shuffle port pressure. 
> I think that the original "pshufb" is better in this case.
That's true after Haswell/Zen, but not for any older SSSE3+ capable targets. We have hard coded depth controls in combineX86ShuffleChain, we've been putting off changing this as we'd ideally drive this by the scheduler.

Repository:
  rL LLVM

https://reviews.llvm.org/D38318