[PATCH] D38318: [X86][SSE] Match PSHUFLW/PSHUFHW + PSHUFD vXi16 shuffle patterns (PR34686)

Wed Sep 27 13:44:18 PDT 2017

RKSimon added inline comments.

================
Comment at: test/CodeGen/X86/vector-shuffle-128-v8.ll:1976
+; AVX2:       # BB#0:
+; AVX2-NEXT:    vpshuflw {{.*#+}} xmm0 = xmm0[0,1,1,0,4,5,6,7]
+; AVX2-NEXT:    vpbroadcastq %xmm0, %xmm0
----------------
RKSimon wrote:
> delena wrote:
> > RKSimon wrote:
> > > zvi wrote:
> > > > Looks like AVX2, AVX512 regressed. Any idea what happened?
> > > We went under the 3-op threshold for combining unary shuffles to PSHUFB (where before it was the PSHUFD+PSHUFLW+PSHUFHW code from SSE2). Despite being 2 ops, this is much smaller in codesize due to not requiring a constant pool entry. It also makes folding easier.
> > Loading a constant form memory may be done outside the loop. And two shuffles instead of one increase shuffle port pressure. 
> > I think that the original "pshufb" is better in this case.
> That's true after Haswell/Zen, but not for any older SSSE3+ capable targets. We have hard coded depth controls in combineX86ShuffleChain, we've been putting off changing this as we'd ideally drive this by the scheduler.
I've committed rL314337 which could be used to permit earlier combining of shuffles to variable masks such as PSHUFB. Ideally though this would be done at a later stage where we have more scheduler details (MC, register pressure etc.).

Repository:
  rL LLVM

https://reviews.llvm.org/D38318