[PATCH] D103462: [SDAG] allow more cast folding for vector sext-of-setcc

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 2 08:49:55 PDT 2021


spatel added inline comments.


================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:10846
+          if (User->getOpcode() != ISD::ZERO_EXTEND ||
+              User->getValueType(0) != VT)
+            return false;
----------------
lebedev.ri wrote:
> spatel wrote:
> > lebedev.ri wrote:
> > > What if it is extending to the even wider type than `VT`?
> > It seems neutral whether we have wider or narrower other type - trade a sext of the compare for a zext of the input. This might be because the transform to create zext-loads is conservative about uses.
> > 
> > Let me know if this matches what you have in mind (I can add a test either way):
> > 
> > ```
> > define <8 x i32> @multi_use_wider_size(<8 x i8>* %src, <8 x i16>* %dst) nounwind {
> >   %load = load <8 x i8>, <8 x i8>* %src
> >   %zext = zext <8 x i8> %load to <8 x i32>
> >   %icmp = icmp eq <8 x i8> %load, zeroinitializer
> >   %sext = sext <8 x i1> %icmp to <8 x i16>
> >   store <8 x i16> %sext, <8 x i16>* %dst
> >   ret <8 x i32> %zext
> > }
> > ```
> > 
> > Currently, we get:
> > 
> > ```
> > vmovq	(%rdi), %xmm1
> > vpmovzxbd	%xmm1, %ymm0 ; 256-bit return value
> > vpxor	%xmm2, %xmm2, %xmm2
> > vpcmpeqb %xmm2, %xmm1, %xmm1
> > vpmovsxbw %xmm1, %xmm1
> > vmovdqa	%xmm1, (%rsi) ; 128-bit compare value
> > 
> > ```
> > If we allow type mismatch:
> > 
> > ```
> > vmovq	(%rdi), %xmm1
> > vpmovzxbd	%xmm1, %ymm0 
> > vpmovzxbw	%xmm1, %xmm1 
> > vpxor	%xmm2, %xmm2, %xmm2
> > vpcmpeqw	%xmm2, %xmm1, %xmm1
> > vmovdqa	%xmm1, (%rsi)
> > 
> > ```
> My point being, iff the other ZEXT is to wider type, then i would expect that the ZEXT we create
> would be absorbed by the load, and that other ZEXT would become ZEXR of ZEXT_LOAD,
> but it seems like we end up with a plain load + 2 ZEXT's?
Right - stepping through the debug spew for this example, if we allow the transform here, we have:
  t7: v8i8,ch = load<(load 8 from %ir.src)> t0, t2, undef:i64
  t21: v8i16 = zero_extend t7 -- feeds into the setcc
  t8: v8i32 = zero_extend t7 -- return value

But the code in ExtendUsesToFormExtLoad() won't convert that load because it doesn't handle the extra use. It already has another FIXME comment about setcc patterns, so we could add one more there.

As a hack, I eased the restriction to see what we would end up with on this x86 example, and it's going to need more work. We truncate the zext load and then zext it back out, so we get this:
  vpmovzxbw	(%rdi), %xmm1   
  vpackuswb	%xmm1, %xmm1, %xmm0
  vpmovzxbd	%xmm0, %ymm0

Ideally, we'd have something like:
  vpmovzxbw	(%rdi), %xmm1   
  vpmovzxwd	%xmm1, %ymm0

So it's going to be a multi-patch process if we want to allow more patterns here, and it's not clear to me that we actually gain from that transform (got rid of a sext, but added a zext?).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103462/new/

https://reviews.llvm.org/D103462



More information about the llvm-commits mailing list