[PATCH] D103462: [SDAG] allow more cast folding for vector sext-of-setcc

Wed Jun 2 08:55:58 PDT 2021

lebedev.ri accepted this revision.
lebedev.ri added a comment.
This revision is now accepted and ready to land.

Seems fine to me unless others have other comments.
Thanks.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:10844
+            continue;
+          // Other users should get folded into the load.
+          if (User->getOpcode() != ISD::ZERO_EXTEND ||
----------------
`// The only extra users that we are okay with is the exact same cast we are about to create.`

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:10846
+          if (User->getOpcode() != ISD::ZERO_EXTEND ||
+              User->getValueType(0) != VT)
+            return false;
----------------
spatel wrote:
> lebedev.ri wrote:
> > spatel wrote:
> > > lebedev.ri wrote:
> > > > What if it is extending to the even wider type than `VT`?
> > > It seems neutral whether we have wider or narrower other type - trade a sext of the compare for a zext of the input. This might be because the transform to create zext-loads is conservative about uses.
> > > 
> > > Let me know if this matches what you have in mind (I can add a test either way):
> > > 
> > > ```
> > > define <8 x i32> @multi_use_wider_size(<8 x i8>* %src, <8 x i16>* %dst) nounwind {
> > >   %load = load <8 x i8>, <8 x i8>* %src
> > >   %zext = zext <8 x i8> %load to <8 x i32>
> > >   %icmp = icmp eq <8 x i8> %load, zeroinitializer
> > >   %sext = sext <8 x i1> %icmp to <8 x i16>
> > >   store <8 x i16> %sext, <8 x i16>* %dst
> > >   ret <8 x i32> %zext
> > > }
> > > ```
> > > 
> > > Currently, we get:
> > > 
> > > ```
> > > vmovq	(%rdi), %xmm1
> > > vpmovzxbd	%xmm1, %ymm0 ; 256-bit return value
> > > vpxor	%xmm2, %xmm2, %xmm2
> > > vpcmpeqb %xmm2, %xmm1, %xmm1
> > > vpmovsxbw %xmm1, %xmm1
> > > vmovdqa	%xmm1, (%rsi) ; 128-bit compare value
> > > 
> > > ```
> > > If we allow type mismatch:
> > > 
> > > ```
> > > vmovq	(%rdi), %xmm1
> > > vpmovzxbd	%xmm1, %ymm0 
> > > vpmovzxbw	%xmm1, %xmm1 
> > > vpxor	%xmm2, %xmm2, %xmm2
> > > vpcmpeqw	%xmm2, %xmm1, %xmm1
> > > vmovdqa	%xmm1, (%rsi)
> > > 
> > > ```
> > My point being, iff the other ZEXT is to wider type, then i would expect that the ZEXT we create
> > would be absorbed by the load, and that other ZEXT would become ZEXR of ZEXT_LOAD,
> > but it seems like we end up with a plain load + 2 ZEXT's?
> Right - stepping through the debug spew for this example, if we allow the transform here, we have:
>   t7: v8i8,ch = load<(load 8 from %ir.src)> t0, t2, undef:i64
>   t21: v8i16 = zero_extend t7 -- feeds into the setcc
>   t8: v8i32 = zero_extend t7 -- return value
> 
> But the code in ExtendUsesToFormExtLoad() won't convert that load because it doesn't handle the extra use. It already has another FIXME comment about setcc patterns, so we could add one more there.
> 
> As a hack, I eased the restriction to see what we would end up with on this x86 example, and it's going to need more work. We truncate the zext load and then zext it back out, so we get this:
>   vpmovzxbw	(%rdi), %xmm1   
>   vpackuswb	%xmm1, %xmm1, %xmm0
>   vpmovzxbd	%xmm0, %ymm0
> 
> Ideally, we'd have something like:
>   vpmovzxbw	(%rdi), %xmm1   
>   vpmovzxwd	%xmm1, %ymm0
> 
> So it's going to be a multi-patch process if we want to allow more patterns here, and it's not clear to me that we actually gain from that transform (got rid of a sext, but added a zext?).
Thanks for checking.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103462/new/

https://reviews.llvm.org/D103462