<div dir="ltr"><div>Why not also support 8-bit vectors?</div><div><br></div>I'd be interested in these intrinsics also supporting integer vector types with the same semantics, and having the vectorization code be able to understand scalar/vector safe division the same way it understands scalar/vector division.<div>

<div><br></div><div>FWIW we did something in PNaCl, but only to guarantee that integer division/modulo by zero reliably traps on all architectures:</div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><a href="https://chromium.googlesource.com/native_client/pnacl-llvm/+/master/lib/Transforms/NaCl/InsertDivideCheck.cpp">https://chromium.googlesource.com/native_client/pnacl-llvm/+/master/lib/Transforms/NaCl/InsertDivideCheck.cpp</a></div>

</blockquote></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Apr 23, 2014 at 10:52 PM, Michael Zolotukhin <span dir="ltr"><<a href="mailto:mzolotukhin@apple.com" target="_blank">mzolotukhin@apple.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div style="font-size:13px">Hi,</div><div style="font-size:13px"><br></div><div style="font-size:13px">

I’d like to propose to extend LLVM IR intrinsics set, adding new ones for safe-division. There are intrinsics for detecting overflow errors, like sadd.with.overflow, and the intrinsics I’m proposing will augment this set.</div>

<div style="font-size:13px"><br></div><div style="font-size:13px">The new intrinsics will return a structure with two elements according to the following rules:</div><div style="font-size:13px"><ol><li>safe.[us]div(x,0) = safe.[us]rem(x,0) = {0, 1}</li>

<li>safe.sdiv(min<T>, -1) = safe.srem(min<T>, -1) = {min<T>, 1}</li><li>In other cases: safe.op(x,y) = {x op y, 0}, where op is sdiv, udiv, srem, or urem</li></ol></div><div style="font-size:13px"><br></div>

<div style="font-size:13px">The use of these intrinsics would be quite the same as it was for arith.with.overflow intrinsics. For instance:</div><div style="font-size:13px"><div>      %res = call {i32, i1} @llvm.safe.sdiv.i32(i32 %a, i32 %b)</div>

<div>      %div = extractvalue {i32, i1} %res, 0</div><div>      %bit = extractvalue {i32, i1} %res, 1</div><div>      br i1 %bit, label %trap, label %normal</div></div><div style="font-size:13px"><br></div><div style="font-size:13px">

Now a few words about their implementation in LLVM. Though the new intrinsics look quite similar to the ones with overflow, there are significant differences. One of them is that during lowering we need to create control-flow for the new ones, while for the existing ones it was sufficient to simply compute the overflow flag. The control flow is needed to guard the division operation, which otherwise can cause an undefined behaviour.</div>

<div style="font-size:13px"><br></div><div style="font-size:13px">The existing intrinsics are lowered in a back-end, during legalization steps. To do the same for the new ones, we’d need a more complicated implementation because of the need to create a new control flow. Also, that would be needed to be done in every backend.</div>

<div style="font-size:13px"><br></div><div style="font-size:13px">Another alternative here is to lower the new intrinsics in CodeGenPrepare pass. That approach looks more convenient to me, because it allows us to have a single implementation for all targets in one place, and it’s easier to introduce control-flow at this point.</div>

<div style="font-size:13px"><br></div><div style="font-size:13px">The patch below implements the second alternative. Along with a straight-forward lowering (which is valid and could be used as a base on all platforms), during the lowering some simple optimizations are performed (which I think is also easier to implement in CodeGenPrepare, than on DAGs):</div>

<div style="font-size:13px"><ol><li>We don’t to generate code for unused part of the result structure.</li><li>If div-instruction on the given platform behaves exactly as needed for the intrinsic (e.g. it takes place for ARM64), we don’t guard the div instruction. As a result, we could avoid branches at all if the second part of the result structure is not used.</li>

<li>The most expected users of the result structure are extractvalue instructions. Having that in mind, we try to propagate the results - in most cases that allows to get rid of all corresponding extractvalues.</li></ol></div>

<div style="font-size:13px"><br></div><div style="font-size:13px">Attached are two patches: the first one with the described implementation and tests, and the second one with the corresponding documentation changes.</div>

<div style="font-size:13px"><br></div><div style="font-size:13px">The first patch happened to already get to the trunk, but the topic is open, and any suggestions are welcome.</div><div style="font-size:13px"><div><br></div>

<div></div></div></div><br><div style="word-wrap:break-word"><div style="font-size:13px"><div></div></div></div><br><div style="word-wrap:break-word"><div style="font-size:13px"><div></div><div><br></div><div>Best regards,</div>

<div>Michael</div></div></div><br>_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

<br></blockquote></div><br></div>