[LLVMdev] Proposal: add intrinsics for safe division

Filip Pizlo fpizlo at apple.com
Fri Apr 25 14:04:40 PDT 2014


On April 25, 2014 at 1:44:37 PM, Reid Kleckner (rnk at google.com) wrote:

Thanks for the writeup! It's very helpful.

On Fri, Apr 25, 2014 at 11:49 AM, Filip Pizlo <fpizlo at apple.com> wrote:

On April 25, 2014 at 10:48:18 AM, Reid Kleckner (rnk at google.com) wrote:

On Fri, Apr 25, 2014 at 10:19 AM, Filip Pizlo <fpizlo at apple.com> wrote:
The sdiv operation in LLVM IR only makes sense for C and its very direct relatives.  The amount of control flow necessary to represent a safe division for any other language is ghastly.  (a/b) becomes something like (b != 0 ? ((a != INT_MIN || b != -1) ? a / b : INT_MIN) : 0).  It's more useful to the optimizer to see this as the single operation that it really is.

This argument makes sense to me. It would be hard for the arm64 backend to pattern match all this control flow back into a single instruction after optimizations.
I want to soften this statement a bit: while ISel can't look across basic blocks, we could probably pattern match away the branches in the SelectionDAGBuilder, because it can see the original use-def chain for the result of the division. This would take a little work, but I can see how it would work and I've only touched SelectionDAGBuilder a few times. 
So, it seems that for JavaScript and Java on ARM - and probably any environment with emulated division, these intrinsics would result in tighter code without requiring target intrinsics.  For all other language/hardware combinations, the main benefit here is to materialize the gnarly control flow later and so hopefully reveal more optimization opportunities.  That seems like a pretty good sweet-spot for the systems that I know about.

If we're going to emit the control flow anyway, then we actually want to expose the control flow to the mid-level optimizers as early as possible.
Not if you want to move code around.  Consider a loop that is too large to unswitch but that contains a "safe division" expressed as branches and that division is loop invariant.  What would happen?  My guess is that the optimizer would have a hard time with it.

  For example, on a non-ARM64 ISA, if the user checked that the divisor was zero already, we can eliminate the zero check branch.  We could also unswitch this loop:

int b = ...;
for (int i = 0; i < n; ++i)
  c[i] = a[i] / b;

Lowering this with branches produces:

for (int i = 0; i < n; ++i) {
  if (b == 0) c[i] = 0;
  else if (b == -1 && a[i] == INT_MIN) c[i] = INT_MIN;
  else c[i] = a[i] / b;
}

b is loop invariant, and we could unswitch this on non-AArch64:

if (b == 0) for (int i = 0; i < n; ++i) c[i] = 0;
else if (b == -1) for (int i = 0; i < n; ++i) c[i] = (a[i] == INT_MIN) ? INT_MIN : a[i] / b;
else for (int i = 0; i < n; ++i) c[i] = a[i] / b;

Note that the middle branch goes away for unsigned division.  It seems to me that the branches are better all around, at least for other ISAs.
Sure, but that would be definitely suboptimal for JS on ARM64.  So if the claim is that ARM64 will optimize this by pattern matching in the backend then this is a clear example of where such pattern matching would be rather hard: an earlier optimization could inhibit our ability to do this effectively.



As for generality, every language except asm.js (or JS-with-i32-coercion-hints) has to branch on the "overflow" bit of the intrinsic anyway.
Heh, I wouldn't call it asm.js.  asm.js adopted something that was already a de-facto idiom.

But yes - most people want to be told if the denominator was zero but want INT_MIN/-1 to flow through gracefully and have defined behavior.



There's also the discussion of iload and istore, where the consensus was that we'd rather have explicit null checks and branches.
Lol.  Since you brought that up, I figure I might as well throw this out: probably some VMs will want a guaranteed trap on divide-by-zero as a free way of throwing the exception.  I actually think that the safediv intrinsic would ultimately make this easier.  The client would generate code like "if (extractelement (safediv (...), 1) == 1) trap" and that could be pattern-matched to mean "emit code that deterministically traps on divide by zero".  I don't think we have the vocabulary necessary to express this in IR.  But I think that matching this in the backend would be easier than matching "if (b == 0) then trap else do a division", since the division could be moved around independently of the branch.



I'm also not excited by the prospect of teaching all the mid-level optimizers about the arithmetic properties of safe.[us]div, which has been a big drawback to the other overflow checked arithmetic intrinsics.

So, in conclusion, I think we should at least *try* to pattern match out the branches in SDAGBuilder for ARM64 before giving up and adding this intrinsic, or something like it.  Does that sound reasonable?
My fear is that since the control flow is complex, the matching would get really nutty.  For example you could be clever for a/b and say "b + 1 >_unsigned 1 ? a / b : slow path".  I actually did that on some VM at some point.  Then there are all of the ways that the control flow could get messed up by optimization phases.  If you ask the frontend that generates LLVM IR to do this on its own then you'll end up with a lot of patterns on your hands.  It would take some effort to canonicalize control flow of this complexity to the point where the pattern matching would be reliable.  I think it's better if we give frontends that need this some way of expressing it to LLVM explicitly.  Note that this doesn't preclude LLVM from lowering the intrinsic as early as we deem appropriate.  But then the choice to do so is within LLVM rather than being delegated to the client, who is less likely to know about specifics like what LLVM can or can't pattern match.

In short, I agree with your observations that these intrinsics are not an obvious slam-dunk compared to making the explicit control flow, but I think that the intrinsics do give enough flexibility on the LLVM side that it would be great if front-ends used them rather than rolling the control flow themselves.

-Filip





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140425/78d67f81/attachment.html>


More information about the llvm-dev mailing list