[llvm-dev] [RFC] Refinement of convergent semantics

Mon Sep 14 20:23:08 PDT 2015

> On Sep 14, 2015, at 3:41 PM, Philip Reames <listmail at philipreames.com> wrote:
> 
> 
> 
> On 09/14/2015 12:30 PM, Owen Anderson wrote:
>>> On Sep 14, 2015, at 12:15 PM, Philip Reames <listmail at philipreames.com> wrote:
>>> 
>>> On 09/04/2015 01:25 PM, Owen Anderson via llvm-dev wrote:
>>>> Hi all,
>>>> 
>>>> In light of recent discussions regarding updating passes to respect convergent semantics, and whether or not it is sufficient for barriers, I would like to propose a change in convergent semantics that should resolve a lot of the identified problems regarding loop unrolling, loop unswitching, etc.  Credit to John McCall for talking this over with me and seeding the core ideas.
>>>> 
>>>> Today, convergent operations may only be moved into control-equivalent locations, or, in layman’s terms, a convergent operation may neither be sunk into nor hoisted out of, a condition.  This causes problems for full loop unrolling, as the control dependence on the loop counter is eliminated, but our intuition indicates that this dependence was somehow trivial.  More concretely, all know uses of convergent are OK with full unrolling, making this semantic undesirable.  Related problems arise in loop unswitching as well.
>>> I don't understand this point.  Loop unrolling specifically won't change which indices actually run.  It might result in code duplication with a subset of indices taken one of two paths.  Does today's convergent also imply no-duplicate?  Is that what you're trying to relax?
>> The definition today says that we cannot remove a control dependence.  Since the loop counter is eliminated entirely, one can argue that we have eliminated the control dependence on it.  I agree that there’s an intuitive argument that the dependence on the loop counter was trivial, but I have no idea how to formalize that.
> I'm still not getting the challenge.  I can write a loop unroll as a two step transform.  1) Introduce another predicated copy of the loop, reduce the iteration count by 1/2.  2) Try to prove the predicated version always execute.  (1) involves code duplication, but doesn't change the dynamic branches executed along any path or the state of any index when hitting the convergent operation.  (2) is merely constant folding and/or CSE.  While it reduces the dynamic branches, it does so in a very structured way.
> 
> Is the problem purely that our current definition is a purely static one?  Does our current definition allow (1) above?
>> 
>> While resolving the question re: loop unrolling is nice, I actually thinking providing answers on which loop unswitching transforms are legal is actually the more novel part of this change.
> I think you can probably break down the argument the same as I did for loop unswitch.  Step 1 - Introduce a new control dependency which doesn't change the state of any index along any path.  Step 2 - Prove away a trivial branch.
> 
> Having said that, I don't care enough about the "convergent" attribute to block you here.  Do what you want.  :)

The problem in both proposed transformations sequences is that you have to add a new control dependence, which is not allowed by either the old or new definitions of convergent.  Coming up with a definition that allows it will be very difficult without introducing more details of SPMD programming models, since the “real” constraint is that only uniform control dependences can be added.  That happens to be true in both cases outlined here (since the loop count is thread-invariant), but would not necessarily be true in other transformations.

However, we can achieve the same end result without needing to introduce more details of SPMD programming into LLVM IR by removing the rule preventing the removal of control dependences.  That’s basically what I’m proposing here.

>> 
>>>> 	nospeculate - this operation may not be added to any program trace on which it was not previously executed (same as notrap?)
>>> Isn't this already true of all instructions?  Unless we can *prove* that speculating an instruction can't introduce faults, we can't speculate it ever.  An unknown call or intrinsic should already have this property right?
>> Possibly?  We probably need a safetospeculate attribute, then.
> The tricky part comes in defining what set of preconditions we want to encode in "safe".  An unconditionally safe to speculate function isn't very common.  So far, most of the cases I've seen come down to "safe if X" where X is "single argument which is non-null".  But that's pretty specific to my frontend.  I suspect we want to gather other inputs here.

I actually have plenty of intrinsics that are universally safe to speculate, because they are essentially convergent math operations.

—Owen