[PATCH] D79886: [DAGCombiner] try to move splat after binop with splat constant

Mon May 25 13:24:22 PDT 2020

spatel marked an inline comment as done.
spatel added inline comments.

================
Comment at: llvm/test/CodeGen/X86/vector-rotate-128.ll:854
 ; SSE2:       # %bb.0:
-; SSE2-NEXT:    pshuflw {{.*#+}} xmm1 = xmm1[0,0,2,3,4,5,6,7]
-; SSE2-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
+; SSE2-NEXT:    movl $65535, %eax # imm = 0xFFFF
+; SSE2-NEXT:    movd %eax, %xmm2
----------------
spatel wrote:
> spatel wrote:
> > craig.topper wrote:
> > > Something weird happened here I think. We appear to be loading a constant as an immediate and moving it into a vector. Then we AND it with something that was just ANDed with a constant pool. Could the two ANDs using a single constant pool?
> > I think the root cause is that we don't have a combine for:
> > ZERO_EXTEND_VECTOR_INREG --> BITCAST 
> > ...if we know all of the high bits are already zero.
> > 
> > That's visible in existing tests - for example in the SSE41 or AVX1 output for this test, we don't need to pand+pmovzxwq.
> > 
> > I have a draft of that patch, and it causes a massive amount of test diffs...taking a look now.
> Oh wait...I got my vector elements mixed up. This is just a bizarre build vector legalization:
> 
> ```
> Legalizing: t83: v8i16 = BUILD_VECTOR Constant:i16<-1>, Constant:i16<0>, Constant:i16<0>, Constant:i16<0>, undef:i16, undef:i16, undef:i16, undef:i16
> Trying custom legalization
> Creating constant: t84: i32 = Constant<65535>
> Creating new node: t85: v4i32 = scalar_to_vector Constant:i32<65535>
> Creating constant: t86: i32 = Constant<0>
> Creating new node: t87: v4i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
> Creating new node: t88: v4i32 = vector_shuffle<4,1,2,3> t87, t85
> Creating new node: t89: v8i16 = bitcast t88
> 
> ```
So we still end up with and-of-and or some variant of that here even with the updated build vector lowering.
I might've been staring at this too long, but I'm not sure how to solve it (and not sure if it's worth the effort).
Here's what I see happening:
1. turn the raw IR into a rotl
2. expand the rotl including masking of the shift amount
3. convert generic vector shifts to x86 shifts by splatted scalar amount
4. end up with 2 'and' with target constants of different element widths:

```
      t63: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<<8 x i16> <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>> 0
    t61: v8i16,ch = load<(load 16 from constant-pool)> t0, t63, undef:i64
  t48: v8i16 = and t4, t61
            t95: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<<4 x i32> <i32 65535, i32 0, i32 0, i32 0>> 0
          t96: v8i16,ch = load<(load 16 from constant-pool)> t0, t95, undef:i64
        t80: v8i16 = and t48, t96

```

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79886/new/

https://reviews.llvm.org/D79886