[PATCH] D109065: [X86] combineX86ShufflesRecursively(): call SimplifyMultipleUseDemandedVectorElts() on after finishing recursing

Fri Sep 17 09:30:03 PDT 2021

lebedev.ri added inline comments.

================
Comment at: llvm/test/CodeGen/X86/insertelement-ones.ll:311

 define <16 x i8> @insert_v16i8_x123456789ABCDEx(<16 x i8> %a) {
 ; SSE2-LABEL: insert_v16i8_x123456789ABCDEx:
----------------
Here we have:
```
Optimized legalized selection DAG: %bb.0 'insert_v16i8_x123456789ABCDEx:'
SelectionDAG has 20 nodes:
  t0: ch = EntryToken
          t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
        t19: v16i8 = and t2, t36
        t20: v16i8 = X86ISD::ANDNP t36, t27
      t21: v16i8 = or t19, t20
      t33: v16i8 = X86ISD::VSHLDQ t27, TargetConstant:i8<15>
    t45: v16i8 = or t21, t33
  t12: ch,glue = CopyToReg t0, Register:v16i8 $xmm0, t45
    t26: v4i32 = scalar_to_vector Constant:i32<255>
  t27: v16i8 = bitcast t26
    t38: i64 = X86ISD::Wrapper TargetConstantPool:i64<<16 x i8> <i8 0, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>> 0
  t36: v16i8,ch = load<(load (s128) from constant-pool)> t0, t38, undef:i64
  t13: ch = X86ISD::RET_FLAG t12, TargetConstant:i32<0>, Register:v16i8 $xmm0, t12:1
```
... so `matchBinaryShuffle()` again fails to omit the masking,
even though it's obviously redundant here for the reasons seen in D109726.
I would suspect that is because around `scalar_to_vector` we operate on i32 elt type,
so we don't have all-ones elements until after `bitcast`.
Without changing `computeKnownBits` to operate on a specified element width,
i'm not sure it can help us further, and that does not sound like the right fix.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109065/new/

https://reviews.llvm.org/D109065