[PATCH] D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask')

Fri Oct 29 05:30:34 PDT 2021

lebedev.ri added a comment.

@RKSimon do you have any concrete arguments against this?

I'm seeing this pattern (well, roughly) when looking at https://bugs.llvm.org/show_bug.cgi?id=52337:

  Optimized type-legalized selection DAG: %bb.0 'mask_i32_stride8_vf4:'
  SelectionDAG has 56 nodes:
    t0: ch = EntryToken
    t6: i64,ch = CopyFromReg t0, Register:i64 %2
              t294: v16i8 = vector_shuffle<0,u,0,u,0,u,0,u,0,u,0,u,0,u,0,u> t282, undef:v16i8
            t255: v8i16 = bitcast t294
          t236: v8i32 = any_extend t255
        t261: v8i32 = shl t236, t259
      t194: v8i32,ch = masked_load<(load (s256) from %ir.ptr, align 4)> t0, t6, undef:i64, t261, undef:v8i32
    t29: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t194
        t195: i64 = add t6, Constant:i64<32>
              t292: v16i8 = vector_shuffle<4,u,4,u,4,u,4,u,4,u,4,u,4,u,4,u> t282, undef:v16i8
            t257: v8i16 = bitcast t292
          t216: v8i32 = any_extend t257
        t260: v8i32 = shl t216, t259
      t196: v8i32,ch = masked_load<(load (s256) from %ir.ptr + 32, align 4)> t0, t195, undef:i64, t260, undef:v8i32
    t31: ch,glue = CopyToReg t29, Register:v8i32 $ymm1, t196, t29:1
        t62: i64 = add t6, Constant:i64<64>
            t268: v8i16 = any_extend_vector_inreg t264
          t173: v8i32 = any_extend t268
        t281: v8i32 = shl t173, t259
      t129: v8i32,ch = masked_load<(load (s256) from %ir.ptr + 64, align 4)> t0, t62, undef:i64, t281, undef:v8i32
    t33: ch,glue = CopyToReg t31, Register:v8i32 $ymm2, t129, t31:1
        t280: i64 = add t6, Constant:i64<96>
              t275: v16i8 = vector_shuffle<8,u,9,u,10,u,11,u,12,u,13,u,14,u,15,u> t264, undef:v16i8
            t272: v8i16 = bitcast t275
          t152: v8i32 = any_extend t272
        t278: v8i32 = shl t152, t259
      t132: v8i32,ch = masked_load<(load (s256) from %ir.ptr + 96, align 4)> t0, t280, undef:i64, t278, undef:v8i32
    t35: ch,glue = CopyToReg t33, Register:v8i32 $ymm3, t132, t33:1
    t259: v8i32 = BUILD_VECTOR Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>
        t287: v32i8 = concat_vectors t282, undef:v16i8
      t289: v32i8 = vector_shuffle<u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,8,8,8,8,8,8,8,8,12,12,12,12,12,12,12,12> t287, undef:v32i8
    t264: v16i8 = extract_subvector t289, Constant:i64<16>
          t2: i64,ch = CopyFromReg t0, Register:i64 %0
        t9: v4i32,ch = load<(load (s128) from %ir.in.vec, align 32)> t0, t2, undef:i64
        t11: v4i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
      t42: v4i32 = setcc t9, t11, setlt:ch
    t282: v16i8 = bitcast t42
    t36: ch = X86ISD::RET_FLAG t35, TargetConstant:i32<0>, Register:v8i32 $ymm0, Register:v8i32 $ymm1, Register:v8i32 $ymm2, Register:v8i32 $ymm3, t35:1

i.e.

      t287: v32i8 = concat_vectors t282, undef:v16i8
    t289: v32i8 = vector_shuffle<u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,8,8,8,8,8,8,8,8,12,12,12,12,12,12,12,12> t287, undef:v32i8
  t264: v16i8 = extract_subvector t289, Constant:i64<16>`

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104156/new/

https://reviews.llvm.org/D104156