[llvm-dev] should we have IR intrinsics for integer min/max?

Mon Nov 7 16:13:06 PST 2016

Thanks, Hal and Matt for the feedback. As usual, my instincts about
canonicalization were probably wrong. :)

I thought that @max1 vs. @max3 would be viewed as an unknowable trade-off
between reducing the dependency chain and the pseudo-canonical min/max
form, so we'd add intrinsics, and defer that decision to the backend.

I'll wait to see if there are any other arguments presented.

@max2 vs. @max3 is a straightforward commute that we should have been doing
anyway, so I can start there. Assuming we go with @max3, we need to add
something to DAGCombine to turn that back into @max1 (PPC w/ isel and
AArch64 do better with @max1; x86 is the same).

On Mon, Nov 7, 2016 at 2:47 PM, Hal Finkel <hfinkel at anl.gov> wrote:

>
> ------------------------------
>
> *From: *"Sanjay Patel via llvm-dev" <llvm-dev at lists.llvm.org>
> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>
> *Sent: *Monday, November 7, 2016 1:01:27 PM
> *Subject: *[llvm-dev] should we have IR intrinsics for integer min/max?
>
> Hi -
>
> The answer to this question may help to resolve larger questions about
> intrinsics and vectorization that were discussed at the dev mtg last week,
> but let's start with the basics:
>
> Which, if any, of these is the canonical IR?
>
> ; ret = x < y ? 0 : x-y
> define i32 @max1(i32 %x, i32 %y) {
>   %sub = sub nsw i32 %x, %y
>   %cmp = icmp slt i32 %x, %y ; cmp is independent of sub
>   %sel = select i1 %cmp, i32 0, i32 %sub
>   ret i32 %sel
> }
>
> ; ret = (x-y) < 0 ? 0 : x-y
> define i32 @max2(i32 %x, i32 %y) {
>   %sub = sub nsw i32 %x, %y
>   %cmp = icmp slt i32 %sub, 0 ; cmp depends on sub, but this looks more
> like a max?
>   %sel = select i1 %cmp, i32 0, i32 %sub
>   ret i32 %sel
> }
>
> ; ret = (x-y) > 0 ? x-y : 0
> define i32 @max3(i32 %x, i32 %y) {
>   %sub = sub nsw i32 %x, %y
>   %cmp = icmp sgt i32 %sub, 0 ; canonicalize cmp+sel - looks even more
> like a max?
>   %sel = select i1 %cmp, i32 %sub, i32 0
>   ret i32 %sel
> }
>
> Noting that all of the above use the same number of IR instructions, I
> prefer this third option:
>
>  1. It uses fewer values in the icmp/select, so the live range of the x
> and y, individually, is shorter. This seems like a reasonable metric for
> simplicity.
>  2. Using a comparison of (x-y) against zero likely makes it easier for
> computing known bits to simply the answer (you only need to compute the
> sign bit).
>  3. The constant of the select, 0, is the second argument (which seems to
> reflect our general canonical choice).
>
>
> define i32 @max4(i32 %x, i32 %y) {
>   %sub = sub nsw i32 %x, %y
>   %max = llvm.smax.i32(i32 %sub, i32 0) ; this intrinsic doesn't exist
> today
>   ret i32 %max
> }
>
> I don't currently see the need for a new intrinsic.
>
>
> FWIW, InstCombine doesn't canonicalize any of the first 3 options
> currently. Codegen suffers because of that (depending on the target machine
> and data types). Regardless of the IR choice, some backend fixes are needed.
>
> Another possible consideration is the structure/accuracy of the cost
> models used by the vectorizers and other passes. I don't think they ever
> special-case the cmp+sel pair as a possibly unified (and therefore cheaper
> than the sum of the parts) operation.
>
> We don't have a facility currently for the target to provide a cost for
> combined operations. We should, but there's design work to be done.
>
>  -Hal
>
>
> Note that we added FP variants for min/max ops with:
> https://reviews.llvm.org/rL220341
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161107/3b735331/attachment.html>