[PATCH] D79886: [DAGCombiner] try to move splat after binop with splat constant
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 25 13:24:22 PDT 2020
spatel marked an inline comment as done.
spatel added inline comments.
================
Comment at: llvm/test/CodeGen/X86/vector-rotate-128.ll:854
; SSE2: # %bb.0:
-; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[0,0,2,3,4,5,6,7]
-; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
+; SSE2-NEXT: movl $65535, %eax # imm = 0xFFFF
+; SSE2-NEXT: movd %eax, %xmm2
----------------
spatel wrote:
> spatel wrote:
> > craig.topper wrote:
> > > Something weird happened here I think. We appear to be loading a constant as an immediate and moving it into a vector. Then we AND it with something that was just ANDed with a constant pool. Could the two ANDs using a single constant pool?
> > I think the root cause is that we don't have a combine for:
> > ZERO_EXTEND_VECTOR_INREG --> BITCAST
> > ...if we know all of the high bits are already zero.
> >
> > That's visible in existing tests - for example in the SSE41 or AVX1 output for this test, we don't need to pand+pmovzxwq.
> >
> > I have a draft of that patch, and it causes a massive amount of test diffs...taking a look now.
> Oh wait...I got my vector elements mixed up. This is just a bizarre build vector legalization:
>
> ```
> Legalizing: t83: v8i16 = BUILD_VECTOR Constant:i16<-1>, Constant:i16<0>, Constant:i16<0>, Constant:i16<0>, undef:i16, undef:i16, undef:i16, undef:i16
> Trying custom legalization
> Creating constant: t84: i32 = Constant<65535>
> Creating new node: t85: v4i32 = scalar_to_vector Constant:i32<65535>
> Creating constant: t86: i32 = Constant<0>
> Creating new node: t87: v4i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
> Creating new node: t88: v4i32 = vector_shuffle<4,1,2,3> t87, t85
> Creating new node: t89: v8i16 = bitcast t88
>
> ```
So we still end up with and-of-and or some variant of that here even with the updated build vector lowering.
I might've been staring at this too long, but I'm not sure how to solve it (and not sure if it's worth the effort).
Here's what I see happening:
1. turn the raw IR into a rotl
2. expand the rotl including masking of the shift amount
3. convert generic vector shifts to x86 shifts by splatted scalar amount
4. end up with 2 'and' with target constants of different element widths:
```
t63: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<<8 x i16> <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>> 0
t61: v8i16,ch = load<(load 16 from constant-pool)> t0, t63, undef:i64
t48: v8i16 = and t4, t61
t95: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<<4 x i32> <i32 65535, i32 0, i32 0, i32 0>> 0
t96: v8i16,ch = load<(load 16 from constant-pool)> t0, t95, undef:i64
t80: v8i16 = and t48, t96
```
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D79886/new/
https://reviews.llvm.org/D79886
More information about the llvm-commits
mailing list