[PATCH] D109065: [X86] combineX86ShufflesRecursively(): call SimplifyMultipleUseDemandedVectorElts() on after finishing recursing

Mon Sep 13 15:10:36 PDT 2021

lebedev.ri added inline comments.

================
Comment at: llvm/test/CodeGen/X86/insertelement-ones.ll:389
+; SSE2-NEXT:    pandn %xmm3, %xmm5
+; SSE2-NEXT:    por %xmm5, %xmm1
 ; SSE2-NEXT:    pand %xmm2, %xmm1
----------------
lebedev.ri wrote:
> RKSimon wrote:
> > Any luck on improving this?
> This one is obscure.
> I believe the problem is `X86ISelLowering.cpp`'s `matchBinaryShuffle()`'s `ISD::OR` lowering.
> 
> We have:
> ```
> mask:  0 1 2 3 4 5 6 7 8 9 10 11 12 13 30 -2
> 
> matchBinaryShuffle()
> EltSizeInBits: 8
> V1:
> t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %1
>   t3: v16i8 = Register %1
> V2:
> t74: v16i8 = X86ISD::VSHLDQ t51, TargetConstant:i8<14>
>   t51: v16i8 = bitcast t50
>     t50: v4i32 = scalar_to_vector Constant:i32<255>
>       t49: i32 = Constant<255>
>   t73: i8 = TargetConstant<14>
> ```
> 
> We can't say anything about `t4`, but i think it's obvious that `t74` is actually
> an all-zeros except the 14'th element, which is all-ones.
> So we of course can lower that as an `or` blend, and we do not care what `t4` is.
> But the code fails to do that.
> 
> I think we'd basically have to do `computeKnownBits()` for each element of V1/V2 separately.
> 
> Should i keep looking?
Ok, got it: D109726

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109065/new/

https://reviews.llvm.org/D109065