[PATCH] D72575: [x86] try harder to form 256-bit unpck*

Mon Jan 13 13:05:44 PST 2020

spatel marked an inline comment as done.
spatel added inline comments.

================
Comment at: llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll:1721
 ; AVX2-FAST-NEXT:    vpermps %ymm0, %ymm2, %ymm0
-; AVX2-FAST-NEXT:    vmovaps {{.*#+}} ymm2 = <u,0,1,1,u,2,3,3>
+; AVX2-FAST-NEXT:    vmovaps {{.*#+}} ymm2 = [0,0,1,1,2,2,3,3]
 ; AVX2-FAST-NEXT:    vpermps %ymm1, %ymm2, %ymm1
----------------
RKSimon wrote:
> lebedev.ri wrote:
> > This looks like some demandedelts deficiency?
> D66004 might catch it, else it might be due to the constant mask already being lowered - we don't do much to simplify constant vectors already lowered to the constant pool.
I think we've already lowered to constant pool loads.

This patch creates the unpack as expected, but then a later combine does:
  Combining: t23: v8i32 = X86ISD::UNPCKL t22, t22
  Creating new node: t60: v8i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<1>, Constant:i32<1>, Constant:i32<2>, Constant:i32<2>, Constant:i32<3>, Constant:i32<3>
  Creating new node: t61: v8i32 = X86ISD::VPERMV t60, t4

And then we lower the build_vector:
        t65: v8i32,ch = load<(load 32 from constant-pool)> t0, t67, undef:i64
        t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %1
      t61: v8i32 = X86ISD::VPERMV t65, t4
       t49: v8i32,ch = load<(load 32 from constant-pool)> t0, t51, undef:i64
        t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %0
      t43: v8i32 = X86ISD::VPERMV t49, t2
    t16: v8i32 = X86ISD::BLENDI t61, t43, TargetConstant:i8<17>

Before we reach the BLENDI. Doesn't seem like there'd be much chance of improvement this late.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72575/new/

https://reviews.llvm.org/D72575