[PATCH] D109065: [X86] combineX86ShufflesRecursively(): call SimplifyMultipleUseDemandedVectorElts() on after finishing recursing

Mon Sep 13 13:44:29 PDT 2021

lebedev.ri added inline comments.

================
Comment at: llvm/test/CodeGen/X86/insertelement-ones.ll:389
+; SSE2-NEXT:    pandn %xmm3, %xmm5
+; SSE2-NEXT:    por %xmm5, %xmm1
 ; SSE2-NEXT:    pand %xmm2, %xmm1
----------------
RKSimon wrote:
> Any luck on improving this?
This one is obscure.
I believe the problem is `X86ISelLowering.cpp`'s `matchBinaryShuffle()`'s `ISD::OR` lowering.

We have:
```
mask:  0 1 2 3 4 5 6 7 8 9 10 11 12 13 30 -2

matchBinaryShuffle()
EltSizeInBits: 8
V1:
t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %1
  t3: v16i8 = Register %1
V2:
t74: v16i8 = X86ISD::VSHLDQ t51, TargetConstant:i8<14>
  t51: v16i8 = bitcast t50
    t50: v4i32 = scalar_to_vector Constant:i32<255>
      t49: i32 = Constant<255>
  t73: i8 = TargetConstant<14>
```

We can't say anything about `t4`, but i think it's obvious that `t74` is actually
an all-zeros except the 14'th element, which is all-ones.
So we of course can lower that as an `or` blend, and we do not care what `t4` is.
But the code fails to do that.

I think we'd basically have to do `computeKnownBits()` for each element of V1/V2 separately.

Should i keep looking?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109065/new/

https://reviews.llvm.org/D109065