[PATCH] D96405: [DAGCombiner] Improve reduceBuildVecToShuffle Performance

Thu Feb 11 01:32:58 PST 2021

mmarjieh added inline comments.

================
Comment at: llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll:115
+; X86-NEXT:    vpsraw $8, %xmm1, %xmm1
+; X86-NEXT:    vpunpcklqdq {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
+; X86-NEXT:    vmovdqu %ymm0, (%eax)
----------------
RKSimon wrote:
> At quick glance  - this looks wrong, I'd expect this still to be the same vshufpd?
I am not familiar with X86's ISA.
Can you explain why?
Meanwhile, I will show you the difference in the DAG after my patch:

Before this patch:
SelectionDAG has 28 nodes:
  t0: ch = EntryToken
  t5: v4i64,ch = load<(load 32 from `<4 x i64>* null`, align 8)> t0, Constant:i32<0>, undef:i32
  t6: v4i64,ch = load<(load 32 from `<4 x i64>* undef`, align 8)> t0, undef:i32, undef:i32
      t21: ch = TokenFactor t5:1, t6:1
              t36: v8i16 = BUILD_VECTOR Constant:i16<0>, Constant:i16<0>, Constant:i16<0>, Constant:i16<0>, undef:i16, undef:i16, undef:i16, undef:i16
            t52: v2f64 = bitcast t36
          t62: v4f64 = concat_vectors t52, undef:v2f64
        t63: v4f64 = vector_shuffle<u,u,0,0> t62, undef:v4f64
              t37: v8i16 = X86ISD::VTRUNC t5
            t39: v8i16 = sign_extend_inreg t37, ValueType:ch:v8i8
          t44: v2f64 = bitcast t39
              t40: v8i16 = X86ISD::VTRUNC t6
            t41: v8i16 = sign_extend_inreg t40, ValueType:ch:v8i8
          t49: v2f64 = bitcast t41
        t59: v4f64 = concat_vectors t44, t49
      t68: v4f64 = vector_shuffle<4,6,2,3> t63, t59
      t3: i32,ch = load<(load 4 from %fixed-stack.0)> t0, FrameIndex:i32<-1>, undef:i32
    t57: ch = store<(store 32 into %ir.10, align 2)> t21, t68, t3, undef:i32
  t29: ch = X86ISD::RET_FLAG t57, TargetConstant:i32<0>

After this patch:
SelectionDAG has 26 nodes:
  t0: ch = EntryToken
  t5: v4i64,ch = load<(load 32 from `<4 x i64>* null`, align 8)> t0, Constant:i32<0>, undef:i32
  t6: v4i64,ch = load<(load 32 from `<4 x i64>* undef`, align 8)> t0, undef:i32, undef:i32
      t21: ch = TokenFactor t5:1, t6:1
              t37: v8i16 = X86ISD::VTRUNC t5
            t39: v8i16 = sign_extend_inreg t37, ValueType:ch:v8i8
          t44: v2f64 = bitcast t39
              t40: v8i16 = X86ISD::VTRUNC t6
            t41: v8i16 = sign_extend_inreg t40, ValueType:ch:v8i8
          t49: v2f64 = bitcast t41
        t59: v4f64 = concat_vectors t44, t49
            t36: v8i16 = BUILD_VECTOR Constant:i16<0>, Constant:i16<0>, Constant:i16<0>, Constant:i16<0>, undef:i16, undef:i16, undef:i16, undef:i16
          t52: v2f64 = bitcast t36
        t62: v4f64 = concat_vectors t52, undef:v2f64
      t64: v4f64 = vector_shuffle<0,2,4,4> t59, t62
      t3: i32,ch = load<(load 4 from %fixed-stack.0)> t0, FrameIndex:i32<-1>, undef:i32
    t57: ch = store<(store 32 into %ir.10, align 2)> t21, t64, t3, undef:i32
  t29: ch = X86ISD::RET_FLAG t57, TargetConstant:i32<0>

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96405/new/

https://reviews.llvm.org/D96405