<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: arial,helvetica,sans-serif; font-size: 10pt; color: #000000'><br><hr id="zwchr"><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Sanjay Patel via llvm-dev" <llvm-dev@lists.llvm.org><br><b>To: </b>"llvm-dev" <llvm-dev@lists.llvm.org><br><b>Sent: </b>Monday, November 7, 2016 1:01:27 PM<br><b>Subject: </b>[llvm-dev] should we have IR intrinsics for integer min/max?<br><br><div dir="ltr"><div><div><div>Hi -<br><br></div>The answer to this question may help to resolve larger questions about intrinsics and vectorization that were discussed at the dev mtg last week, but let's start with the basics:<br><br></div>Which, if any, of these is the canonical IR?<br></div><div><br>; ret = x < y ? 0 : x-y<br></div><div>define i32 @max1(i32 %x, i32 %y) {<br>  %sub = sub nsw i32 %x, %y<br>  %cmp = icmp slt i32 %x, %y ; cmp is independent of sub<br>  %sel = select i1 %cmp, i32 0, i32 %sub<br>  ret i32 %sel<br>}<br><br></div><div>; ret = (x-y) < 0 ? 0 : x-y<br></div><div>define i32 @max2(i32 %x, i32 %y) {<br>  %sub = sub nsw i32 %x, %y<br>  %cmp = icmp slt i32 %sub, 0 ; cmp depends on sub, but this looks more like a max?<br>  %sel = select i1 %cmp, i32 0, i32 %sub<br>  ret i32 %sel<br>}<br><br></div><div>; ret = (x-y) > 0 ? x-y : 0<br></div><div id="DWT18795">define i32 @max3(i32 %x, i32 %y) {<br>  %sub = sub nsw i32 %x, %y<br>  %cmp = icmp sgt i32 %sub, 0 ; canonicalize cmp+sel - looks even more like a max?<br>  %sel = select i1 %cmp, i32 %sub, i32 0<br>  ret i32 %sel<br>}<br></div></div></blockquote>Noting that all of the above  use the same number of IR instructions, I prefer this third option:<br><br> 1. It uses fewer values in the icmp/select, so the live range of the x and y, individually, is shorter. This seems like a reasonable metric for simplicity.<br> 2. Using a comparison of (x-y) against zero likely makes it easier for computing known bits to simply the answer (you only need to compute the sign bit).<br> 3. The constant of the select, 0, is the second argument (which seems to reflect our general canonical choice).<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div><br>define i32 @max4(i32 %x, i32 %y) {<br>  %sub = sub nsw i32 %x, %y<br></div><div>  %max = llvm.smax.i32(i32 %sub, i32 0) ; this intrinsic doesn't exist today<br></div><div id="DWT18796">  ret i32 %max<br>}<br><br></div></div></blockquote>I don't currently see the need for a new intrinsic.<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div></div><div><br></div><div><div><div><div><div><div><div>FWIW, InstCombine doesn't canonicalize any of the first 3 options currently. Codegen suffers because of that (depending on the target machine and data types). Regardless of the IR choice, some backend fixes are needed.<br></div><div id="DWT18794"><br>Another possible consideration is the structure/accuracy of the cost models used by the 

vectorizers and other passes. I don't think they ever special-case 

the cmp+sel pair as a possibly unified (and therefore cheaper than the sum of the parts) 

operation.<br></div></div></div></div></div></div></div></div></blockquote>We don't have a facility currently for the target to provide a cost for combined operations. We should, but there's design work to be done.<br><br> -Hal<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div><div><div><div><div><div><div><br>Note that we added FP variants for min/max ops with:<br><a href="https://reviews.llvm.org/rL220341" target="_blank">https://reviews.llvm.org/rL220341</a><br><br></div><div><br></div></div></div></div></div></div></div></div>

<br>_______________________________________________<br>LLVM Developers mailing list<br>llvm-dev@lists.llvm.org<br>http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<br></blockquote><br><br><br>-- <br><div><span name="x"></span>Hal Finkel<br>Lead, Compiler Technology and Programming Languages<br>Leadership Computing Facility<br>Argonne National Laboratory<span name="x"></span><br></div></div></body></html>