[LLVMdev] Proposal: add intrinsics for safe division

Fri Apr 25 13:44:35 PDT 2014

Thanks for the writeup! It's very helpful.

On Fri, Apr 25, 2014 at 11:49 AM, Filip Pizlo <fpizlo at apple.com> wrote:

>
> On April 25, 2014 at 10:48:18 AM, Reid Kleckner (rnk at google.com) wrote:
>
> On Fri, Apr 25, 2014 at 10:19 AM, Filip Pizlo <fpizlo at apple.com> wrote:
>>
>> The sdiv operation in LLVM IR only makes sense for C and its very direct
>> relatives.  The amount of control flow necessary to represent a safe
>> division for any other language is ghastly.  (a/b) becomes something like
>> (b != 0 ? ((a != INT_MIN || b != -1) ? a / b : INT_MIN) : 0).  It's more
>> useful to the optimizer to see this as the single operation that it really
>> is.
>>
> This argument makes sense to me. It would be hard for the arm64 backend to
> pattern match all this control flow back into a single instruction after
> optimizations.
>
> I want to soften this statement a bit: while ISel can't look across basic
blocks, we could probably pattern match away the branches in the
SelectionDAGBuilder, because it can see the original use-def chain for the
result of the division. This would take a little work, but I can see how it
would work and I've only touched SelectionDAGBuilder a few times.

> So, it seems that for JavaScript and Java on ARM - and probably any
> environment with emulated division, these intrinsics would result in
> tighter code without requiring target intrinsics.  For all other
> language/hardware combinations, the main benefit here is to materialize the
> gnarly control flow later and so hopefully reveal more optimization
> opportunities.  That seems like a pretty good sweet-spot for the systems
> that I know about.
>
If we're going to emit the control flow anyway, then we actually want to
expose the control flow to the mid-level optimizers as early as possible.
 For example, on a non-ARM64 ISA, if the user checked that the divisor was
zero already, we can eliminate the zero check branch.  We could also
unswitch this loop:

int b = ...;
for (int i = 0; i < n; ++i)
  c[i] = a[i] / b;

Lowering this with branches produces:

for (int i = 0; i < n; ++i) {
  if (b == 0) c[i] = 0;
  else if (b == -1 && a[i] == INT_MIN) c[i] = INT_MIN;
  else c[i] = a[i] / b;
}

b is loop invariant, and we could unswitch this on non-AArch64:

if (b == 0) for (int i = 0; i < n; ++i) c[i] = 0;
else if (b == -1) for (int i = 0; i < n; ++i) c[i] = (a[i] == INT_MIN) ?
INT_MIN : a[i] / b;
else for (int i = 0; i < n; ++i) c[i] = a[i] / b;

Note that the middle branch goes away for unsigned division.  It seems to
me that the branches are better all around, at least for other ISAs.

As for generality, every language except asm.js (or
JS-with-i32-coercion-hints) has to branch on the "overflow" bit of the
intrinsic anyway.

There's also the discussion of iload and istore, where the consensus was
that we'd rather have explicit null checks and branches.

I'm also not excited by the prospect of teaching all the mid-level
optimizers about the arithmetic properties of safe.[us]div, which has been
a big drawback to the other overflow checked arithmetic intrinsics.

So, in conclusion, I think we should at least *try* to pattern match out
the branches in SDAGBuilder for ARM64 before giving up and adding this
intrinsic, or something like it.  Does that sound reasonable?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140425/6477928e/attachment.html>