[PATCH] D109065: [X86] combineX86ShufflesRecursively(): call SimplifyMultipleUseDemandedVectorElts() on after finishing recursing
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 13 15:10:36 PDT 2021
lebedev.ri added inline comments.
================
Comment at: llvm/test/CodeGen/X86/insertelement-ones.ll:389
+; SSE2-NEXT: pandn %xmm3, %xmm5
+; SSE2-NEXT: por %xmm5, %xmm1
; SSE2-NEXT: pand %xmm2, %xmm1
----------------
lebedev.ri wrote:
> RKSimon wrote:
> > Any luck on improving this?
> This one is obscure.
> I believe the problem is `X86ISelLowering.cpp`'s `matchBinaryShuffle()`'s `ISD::OR` lowering.
>
> We have:
> ```
> mask: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 30 -2
>
> matchBinaryShuffle()
> EltSizeInBits: 8
> V1:
> t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %1
> t3: v16i8 = Register %1
> V2:
> t74: v16i8 = X86ISD::VSHLDQ t51, TargetConstant:i8<14>
> t51: v16i8 = bitcast t50
> t50: v4i32 = scalar_to_vector Constant:i32<255>
> t49: i32 = Constant<255>
> t73: i8 = TargetConstant<14>
> ```
>
> We can't say anything about `t4`, but i think it's obvious that `t74` is actually
> an all-zeros except the 14'th element, which is all-ones.
> So we of course can lower that as an `or` blend, and we do not care what `t4` is.
> But the code fails to do that.
>
> I think we'd basically have to do `computeKnownBits()` for each element of V1/V2 separately.
>
> Should i keep looking?
Ok, got it: D109726
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D109065/new/
https://reviews.llvm.org/D109065
More information about the llvm-commits
mailing list