[llvm-dev] should we have IR intrinsics for integer min/max?

Mon Nov 7 13:47:26 PST 2016

----- Original Message -----

> From: "Sanjay Patel via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Monday, November 7, 2016 1:01:27 PM
> Subject: [llvm-dev] should we have IR intrinsics for integer min/max?

> Hi -

> The answer to this question may help to resolve larger questions
> about intrinsics and vectorization that were discussed at the dev
> mtg last week, but let's start with the basics:

> Which, if any, of these is the canonical IR?

> ; ret = x < y ? 0 : x-y

> define i32 @max1(i32 %x, i32 %y) {
> %sub = sub nsw i32 %x, %y
> %cmp = icmp slt i32 %x, %y ; cmp is independent of sub
> %sel = select i1 %cmp, i32 0, i32 %sub
> ret i32 %sel
> }

> ; ret = (x-y) < 0 ? 0 : x-y

> define i32 @max2(i32 %x, i32 %y) {
> %sub = sub nsw i32 %x, %y
> %cmp = icmp slt i32 %sub, 0 ; cmp depends on sub, but this looks more
> like a max?
> %sel = select i1 %cmp, i32 0, i32 %sub
> ret i32 %sel
> }

> ; ret = (x-y) > 0 ? x-y : 0

> define i32 @max3(i32 %x, i32 %y) {
> %sub = sub nsw i32 %x, %y
> %cmp = icmp sgt i32 %sub, 0 ; canonicalize cmp+sel - looks even more
> like a max?
> %sel = select i1 %cmp, i32 %sub, i32 0
> ret i32 %sel
> }

Noting that all of the above use the same number of IR instructions, I prefer this third option: 

1. It uses fewer values in the icmp/select, so the live range of the x and y, individually, is shorter. This seems like a reasonable metric for simplicity. 
2. Using a comparison of (x-y) against zero likely makes it easier for computing known bits to simply the answer (you only need to compute the sign bit). 
3. The constant of the select, 0, is the second argument (which seems to reflect our general canonical choice). 

> define i32 @max4(i32 %x, i32 %y) {
> %sub = sub nsw i32 %x, %y

> %max = llvm.smax.i32(i32 %sub, i32 0) ; this intrinsic doesn't exist
> today

> ret i32 %max
> }

I don't currently see the need for a new intrinsic. 

> FWIW, InstCombine doesn't canonicalize any of the first 3 options
> currently. Codegen suffers because of that (depending on the target
> machine and data types). Regardless of the IR choice, some backend
> fixes are needed.

> Another possible consideration is the structure/accuracy of the cost
> models used by the vectorizers and other passes. I don't think they
> ever special-case the cmp+sel pair as a possibly unified (and
> therefore cheaper than the sum of the parts) operation.

We don't have a facility currently for the target to provide a cost for combined operations. We should, but there's design work to be done. 

-Hal 

> Note that we added FP variants for min/max ops with:
> https://reviews.llvm.org/rL220341

> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161107/4a2c2b91/attachment.html>