[PATCH] D109065: [X86] combineX86ShufflesRecursively(): call SimplifyMultipleUseDemandedVectorElts() on after finishing recursing
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 13 13:44:29 PDT 2021
lebedev.ri added inline comments.
================
Comment at: llvm/test/CodeGen/X86/insertelement-ones.ll:389
+; SSE2-NEXT: pandn %xmm3, %xmm5
+; SSE2-NEXT: por %xmm5, %xmm1
; SSE2-NEXT: pand %xmm2, %xmm1
----------------
RKSimon wrote:
> Any luck on improving this?
This one is obscure.
I believe the problem is `X86ISelLowering.cpp`'s `matchBinaryShuffle()`'s `ISD::OR` lowering.
We have:
```
mask: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 30 -2
matchBinaryShuffle()
EltSizeInBits: 8
V1:
t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %1
t3: v16i8 = Register %1
V2:
t74: v16i8 = X86ISD::VSHLDQ t51, TargetConstant:i8<14>
t51: v16i8 = bitcast t50
t50: v4i32 = scalar_to_vector Constant:i32<255>
t49: i32 = Constant<255>
t73: i8 = TargetConstant<14>
```
We can't say anything about `t4`, but i think it's obvious that `t74` is actually
an all-zeros except the 14'th element, which is all-ones.
So we of course can lower that as an `or` blend, and we do not care what `t4` is.
But the code fails to do that.
I think we'd basically have to do `computeKnownBits()` for each element of V1/V2 separately.
Should i keep looking?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D109065/new/
https://reviews.llvm.org/D109065
More information about the llvm-commits
mailing list