[PATCH] D38318: [X86][SSE] Match PSHUFLW/PSHUFHW + PSHUFD vXi16 shuffle patterns (PR34686)
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 27 12:58:10 PDT 2017
RKSimon added inline comments.
================
Comment at: test/CodeGen/X86/vector-shuffle-128-v8.ll:1976
+; AVX2: # BB#0:
+; AVX2-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,1,1,0,4,5,6,7]
+; AVX2-NEXT: vpbroadcastq %xmm0, %xmm0
----------------
delena wrote:
> RKSimon wrote:
> > zvi wrote:
> > > Looks like AVX2, AVX512 regressed. Any idea what happened?
> > We went under the 3-op threshold for combining unary shuffles to PSHUFB (where before it was the PSHUFD+PSHUFLW+PSHUFHW code from SSE2). Despite being 2 ops, this is much smaller in codesize due to not requiring a constant pool entry. It also makes folding easier.
> Loading a constant form memory may be done outside the loop. And two shuffles instead of one increase shuffle port pressure.
> I think that the original "pshufb" is better in this case.
That's true after Haswell/Zen, but not for any older SSSE3+ capable targets. We have hard coded depth controls in combineX86ShuffleChain, we've been putting off changing this as we'd ideally drive this by the scheduler.
Repository:
rL LLVM
https://reviews.llvm.org/D38318
More information about the llvm-commits
mailing list