[LLVMdev] Proposal: add intrinsics for safe division

Mon May 5 05:58:45 PDT 2014

Hi Andy, Philip, Filip, Eric and others,

I didn’t want to talk about patchpoints, because I don’t know much about them.

Let me comment some remarks regarding the current proposal though:

I do request the following changes:
- Mark it clearly as experimental.
I agree.

- Either don't specify the value computed in the edge cases, or allow those values to be specified as constant arguments to the call.  This allows efficient lowering to x86's div instruction if you want to make use of the trapping semantics.
Actually, I don’t see much use of this. Frontends like Java and Go will still have to generate calls with explicit constant for the edge cases, making them equivalent to the currently proposed intrinsics.

Given:
void foo(int[] a, int d) {
  for i = 0....a.length {
    a[i] = a[i]/d
  }
}
I want:
void foo(int[]a, int d) {
  // A single check here, not one per iteration
  if( a.length > 0 && d == 0 ) throw;
  for i = 0...a.length {
    // arguably, it might be better to split the loop for d == -1
    if( a[i] == INT_MIN && d == -1 ) {
       continue;
    }
    a[i] = x86_div(a[i], d);
  }
}
I understand your concern about the checks on x86 - in this example it really might be useful to have explicit CFG to help LICM to hoist the check out of the loop. I suggest to address this concern by adding mechanism to lower the intrinsics early (as proposed by Andy as well). Along with that, we can try to teach loop optimizations to deal with the intrinsics, and maybe later we won’t need this early-lowering at all (but while it’s needed it’ll be there). Thus, at the beginning, when the optimizations know nothing about the intrinsics, we’ll just lower them early for some targets (x86) and keep until CGP for the others (ARM64). Also, that will allow us to compare performance with early lowering vs CGP lowering.

Does that sound reasonable?

Thanks,
Michael

On May 3, 2014, at 4:45 AM, Andrew Trick <atrick at apple.com> wrote:

> 
> On May 2, 2014, at 5:01 PM, Philip Reames <listmail at philipreames.com> wrote:
> 
>> 
>> On 05/02/2014 04:46 PM, Andrew Trick wrote:
>>> 
>>> On May 2, 2014, at 4:25 PM, Philip Reames <listmail at philipreames.com> wrote:
>>> 
>>>> 
>>>> I'm still not happy with hiding the explicit control flow, but I can achieve the same effect as:
>>>> if( div by zero ) throw
>>>> (r, o, d) = safe.div(a,b);
>>>> if( o ) {
>>>>   r = a;
>>>> }
>>>> 
>>>> i.e. emit a separate guard branch for the interesting condition and not utilize the value from the safe div.  
>>> 
>>> Oh, I was just assuming you would deopt on either div-by-zero or overflow in order to optimize the common case.
>>> 
>>> So, the point of safe.div is that you don’t need control flow in the IR except to indicate the throw.
>>> 
>>> (r, o, d) = safe.div(a,b)
>>> if (d) { patchpoint(throw, state) }
>>> 
>>> You could also explicitly check div-by-zero but it would be harder to fold away that extra check during lowering.
>> I think I may have worded this badly.  The motivation for the explicit guard is to enable LICM and other branch simplifications.  This point is completely separate from the choice to deopt or not.  
>> 
>> Consider the example below.  This is specifically for x86 with it's trapping divide.  Assume I am not utilizing traps, but need to guard the divide to handle the two special cases.  The exact handling of the edge cases isn't really significant, but the guards must exist.  
> 
> Yes, I completely understand what you want. I would like to hoist runtime traps even more aggressively than what LLVM will do when the control flow is explicit, but that has nothing to do with safe.div. We’ll discuss it later in a patchpoint thread.
> 
> The issue of loop unswitching came up earlier in the thread. All these options are good
> - generate fully explicit control flow + sdiv
> - generate partially explicit control flow + safe.div
> - expand safe.div early based on TTI
> - teach unswitching about safe.div
> 
> The only thing I did not particularly like was the need for frontends to generate target specific intrinsics for a common operation.
> 
> -Andy
> 
>> Given:
>> void foo(int[] a, int d) {
>>   for i = 0....a.length {
>>     a[i] = a[i]/d
>>   }
>> }
>> 
>> I want:
>> void foo(int[]a, int d) {
>>   // A single check here, not one per iteration
>>   if( a.length > 0 && d == 0 ) throw;
>>   for i = 0...a.length {
>>     // arguably, it might be better to split the loop for d == -1
>>     if( a[i] == INT_MIN && d == -1 ) {
>>        continue;
>>     }
>>     a[i] = x86_div(a[i], d);
>>   }
>> }
>> 
>> Not:
>> void foo(int[]a, int d) {
>> 
>>   for i = 0...a.length {
>>     if( a[i] == INT_MIN && d == -1 ) {
>>        continue;
>>     }
>>     if( d == 0 ) throw;
>>     a[i] = x86_div(a[i], d);
>>   }
>> }
>> 
>> Hopefully, this helps clarify my point.  
>> 
>> Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140505/89730cee/attachment.html>