[PATCH] D49179: [InstCombine] Fold x & (-1 >> y) == x to x u<= (-1 >> y)

Thu Jul 12 07:29:42 PDT 2018

lebedev.ri added a comment.

In https://reviews.llvm.org/D49179#1160049, @spatel wrote:

> In https://reviews.llvm.org/D49179#1160020, @lebedev.ri wrote:
>
> > In https://reviews.llvm.org/D49179#1159036, @spatel wrote:
> >
> > > In https://reviews.llvm.org/D49179#1159024, @lebedev.ri wrote:
> > >
> > > > In https://reviews.llvm.org/D49179#1158957, @lebedev.ri wrote:
> > > >
> > > > > In https://reviews.llvm.org/D49179#1158930, @spatel wrote:
> > > > >
> > > > > > LGTM.
> > > > >
> > > > >
> > > > > Thank you for such speedy review!
> > > > >
> > > > > I'm wondering, what about the signed case?
> > > > >  https://godbolt.org/g/WvWX13
> > > > >  https://godbolt.org/g/DM7XA4
> > > > >  https://rise4fun.com/Alive/Qslx (check that 25 high bits are either all-ones, or all-zeros)
> > > > >
> > > > > Can we do this in instcombine?
> > > > >  Or not, given that we were disabling some bit-fiddling transforms lately?
> > > >
> > > >
> > > > Actually, not sure that is any better, will leave that for now.
> > > >  Though the reverse transform might be a good thing for dagcombine.
> > >
> > >
> > > It's a good question, but probably better asked on the dev list than here. I think we prefer to canonicalize to the form with less instructions even if that means we lose information from the eliminated ops.
> >
> >
> > For future reference, here is a more straight-forward fold https://rise4fun.com/Alive/XuW, that will afterwards fold back into and+icmp https://godbolt.org/g/bm3yZu
> >  But any such fold will clearly need dagcombine work, since the 'naive' version with shifts seems to produce optimal assembly already.
> >  So after all i'm not sure i'm motivated to look into the 'signed truncation pattern' right now..
> >  F6668611: x86-signed-truncation-check.ll <https://reviews.llvm.org/F6668611> F6668610: aarch64-signed-truncation-check.ll <https://reviews.llvm.org/F6668610>
>
>
>
>
>   Name: signed truncation check
>     %old0 = shl i16 %x, 8
>     %old1 = ashr exact i16 %old0, 8
>     %ret = icmp eq i16 %old1, %x
>   =>
>     %new0 = icmp slt i16 %x, 128
>     %new1 = icmp sgt i16 %x, -129
>     %ret = and i1 %new1, %new0
>  
>   Name: and-of-icmps
>     %new0 = icmp slt i16 %x, 128
>     %new1 = icmp sgt i16 %x, -129
>     %ret = and i1 %new1, %new0
>   =>
>     %x.off = add i16 %x, 128
>     %ret = icmp ult i16 %x.off, 256
>
>
> Let me make sure I'm seeing it - as a question of IR canonicalization, we're deciding which of these 3 forms is best?

Not really. I agree that the last one is the best from IR clarity standpoint.
I was just looking for some other variants of this pattern.
(Sadly souper was of no help here, strangely.)

> I don't see any reason to favor the 1st two options over the 3rd (shorter) one. We already convert the 2nd to the 3rd (although as noted here, that transform is not happening as expected in all cases).

I agree.

> If the backend can't produce optimal code from the 3rd form, that's a concern, but it doesn't necessarily have to hold up the IR improvement.
>  We might live with that (hopefully minor and temporary) problem because the IR improvement can lead to optimizations in other passes that we don't see in any of the minimal examples.

Repository:
  rL LLVM

https://reviews.llvm.org/D49179