[PATCH] D86429: [X86] Make lowerShuffleAsLanePermuteAndShuffle use sublanes on AVX2

Mon Aug 24 00:34:50 PDT 2020

TellowKrinkle added a comment.

While most of the test output changes seemed positive, a few stood out to me as being worse:

- shuffle_v16i16_00_16_01_17_02_18_03_27_08_24_09_25_10_26_11_27 and shuffle_v16i16_00_16_01_17_02_18_03_27_08_24_09_25_10_26_11_27 previously got lowered to a blend of two `vunpcklwd`s, which then got optimized into a single `vunpcklwd`.  Since the new codegen no longer outputs an unpcklwd, that optimization no longer applies.
- The AVX512VL output for shuffle_v16i16_uu_uu_uu_uu_04_05_06_11_uu_uu_uu_uu_12_13_14_11 now loses track of the fact that a bunch of its outputs are undef.  The output from lowerShuffleAsLanePermuteAndShuffle is a <4,5,6,7,8,9,10,11,12,13,14,15,8,9,10,11> shuffle followed by a <u,u,u,u,0,1,2,7,u,u,u,u,8,9,10,15> shuffle, which get combined back together in a later pass.  Not sure if this is important.
- The output of the pass is often asymmetric, and therefore doesn't work well with fixed shuffles, leading almost everything to use vpshufb.  On systems with fast-variable-shuffle this is fine, since it usually saves a few instructions, but on systems without, this is mixed.  Many byte shuffles already required vpshufb anyways, and this can reduce the number of them or at least reduce the number of other instructions, but many word shuffles get changed from not requiring variable shuffles to requiring them (e.g. shuffle_v16i16_00_01_00_01_02_03_02_11_08_09_08_09_10_11_10_11).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86429/new/

https://reviews.llvm.org/D86429