[llvm-dev] [RFC] Canonicalization of unsigned subtraction with saturation

Fri May 26 07:57:32 PDT 2017

There's no formal process to decide a canonicalization (and it could always
be changed), but given that we have:
(a) not heard any objections to the min/max forms
(b) see advantages to the min/max forms

I think it's worth adding codegen tests for the preferred IR patterns. At
the least, we want to make sure that there are no regressions for those
forms vs. the forms that do not contain min/max. For x86, we already know
that we want to see 'psubus' on some of these, so getting that implemented
is also a safe step IMO.

On Fri, May 26, 2017 at 3:46 AM, Koval, Julia <julia.koval at intel.com> wrote:

> So, can we declare the second version as a cannonical?
> Second version also has another advantage. If operands of select are
> different types, it still works as max. In the first case it is separated
> by trunk and not converts to max instruction in the end. Here is an example
> for 32 and 16 bit:
>
> (1)
> void foo(unsigned short *p, int max, int n) {
>   int i;
> unsigned m;
> for (i = 0; i < n; i++) {
>     m = *--p;
>     *p = (unsigned short)(m >= max ? m-max : 0);
>   }
> }
>
> (2)
> void goo(unsigned short *p, int max, int n) {
>   int i;
>   unsigned m;
>   for (i = 0; i < n; i++) {
>     m = *--p;
>     unsigned umax = m > max ? m : max;
>     *p = (unsigned short)(umax - max);
>   }
> }
>
> (1)
> %cmp = icmp ugt i32 %x, %y
> %sub2 = sub i32 %y, %x
> %tr = trunc i32 %sub2 to i16
> %res = select i1 %cmp, i16 0, i16 %tr
>
>  or
>
> (2)
> %cmp = icmp ugt i32 %x, %y
> %sel = select i1 %cmp, i32 %x, i32 %y
> %sub = sub i32 %sel, %x
> %res = trunc i32 %sub to i16
>
>
> -Julia
>
> > -----Original Message-----
> > From: Sanjay Patel [mailto:spatel at rotateright.com]
> > Sent: Tuesday, May 16, 2017 9:49 PM
> > To: Friedman, Eli <efriedma at codeaurora.org>
> > Cc: Koval, Julia <julia.koval at intel.com>; llvm-dev at lists.llvm.org;
> Bozhenov,
> > Nikolai <nikolai.bozhenov at intel.com>; Elovikov, Andrei
> > <andrei.elovikov at intel.com>; Hal Finkel <hfinkel at anl.gov>; David
> Majnemer
> > <david.majnemer at gmail.com>
> > Subject: Re: [RFC] Canonicalization of unsigned subtraction with
> saturation
> >
> >
> >
> > On Tue, May 16, 2017 at 12:18 PM, Friedman, Eli <efriedma at codeaurora.org
> > <mailto:efriedma at codeaurora.org> > wrote:
> >
> >
> >
> >       On 5/16/2017 6:30 AM, Sanjay Patel wrote:
> >
> >
> >               Thanks for posting this question, Julia.
> >
> >               I had a similar question about a signed min/max variant
> here:
> >               http://lists.llvm.org/pipermail/llvm-dev/2016-
> > November/106868.html <http://lists.llvm.org/pipermail/llvm-dev/2016-
> > November/106868.html>
> >
> >               The 2nd version in each case contains a canonical max/min
> > representation in IR, and this could enable more IR analysis.
> >               A secondary advantage is that the backend recognizes the
> > max/min in the second IR form when creating DAG nodes,
> >               and this directly affects isel for many targets.
> >
> >
> >
> >       This seems important.  And pattern-matching max(x,y)-y to a
> saturating
> > subtract seems easy in the backend.
> >
> >
> >
> >               A possibly important difference between the earlier example
> > and the current unsigned case:
> >               is a select with a zero constant operand easier to reason
> about
> > in IR than the canonical min/max?
> >
> >
> >
> >       It might be in some cases... maybe?  I mean, it might be easier to
> > analyze in ComputeMaskedBits or something, but we don't really do much to
> > optimize selects involving zero.
> >
> >
> >
> > Because of my CPU upbringing, I always see this:
> >
> > select A, B, 0
> >
> >
> > as:
> >
> > and (sext A), B
> >
> >
> > Any chance of control-flow is bad!
> > ...but now I know that's wrong for IR. :)
> >
> >
> > So forming the min/max sounds like the right answer to me.
> >
> >
> > Note that we don't actually canonicalize the signed min/max cases from
> the
> > earlier thread yet. We detect those in value tracking, and that was good
> enough
> > to produce the ideal backend results, but I haven't gotten back to doing
> the
> > transform in IR.
> >
> >
> >
> >
> >
> >
> >
> >       -Eli
> >
> >
> >
> >               On Tue, May 16, 2017 at 5:30 AM, Koval, Julia
> > <julia.koval at intel.com <mailto:julia.koval at intel.com> > wrote:
> >
> >
> >                       (1.16)
> >                       %cmp = icmp ugt i16 %x, %y
> >                       %sub2 = sub i16 %y, %x
> >                       %res = select i1 %cmp, i16 0, i16 %sub2
> >
> >                       or
> >
> >                       (2.16)
> >                       %cmp = icmp ugt i16 %x, %y
> >                       %sel = select i1 %cmp, i16 %x, i16 %y
> >                       %sub = sub i16 %sel, %x
> >
> >                       Which of these versions is canonical? I think first
> > version is better, because it can be converted to unsigned saturation
> > instruction(i.e. PSUBUS), using existing backend code.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >       --
> >       Employee of Qualcomm Innovation Center, Inc.
> >       Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > a Linux Foundation Collaborative Project
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170526/df8d7f38/attachment.html>