[PATCH] D66456: [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef)

Tue Aug 20 10:07:30 PDT 2019

craig.topper marked an inline comment as done.
craig.topper added inline comments.

================
Comment at: llvm/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll:7593
 ; NoVLX-NEXT:    kmovw %k0, %eax
-; NoVLX-NEXT:    andl $3, %eax
+; NoVLX-NEXT:    movzbl %al, %eax
 ; NoVLX-NEXT:    vzeroupper
----------------
craig.topper wrote:
> RKSimon wrote:
> > This looks like we're missing a computeKnownBitsForTargetNode handling for a X86ISD opcode?
> There aren’t any target nodes here. The kshifts should be coming from isel for an insert_subvector. I think we’re probably missing a combine for insert_subvector into zero followed by an insert into undef. Maybe with an extract between them.
For these cases we end up with a DAG like this.

t33: v16i1 = BUILD_VECTOR Constant:i8<0>, Constant:i8<0>, Constant:i8<0>,  ....
t6: v2i1 = setcc t2, t4, seteq:ch
          t34: v16i1 = insert_subvector t33, t6, Constant:i64<0>
        t35: v8i1 = extract_subvector t34, Constant:i64<0>
      t18: i8 = bitcast t35
    t36: i32 = zero_extend t18

We can't simplfy the t33 input to the insert_subvector, since t6 is only 2 bits. We can probably add a combine to turn the bitcast into a v16i1->i16 bitcast from the insert_subvector to get rid of the extract. Then the zero_extend will be from i16 to i32 which we should be able to optimize out through an isel pattern that we have for that to use KMOVW.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66456/new/

https://reviews.llvm.org/D66456