[PATCH] D26348: Allow convergent attribute for function arguments

Tue Nov 8 10:00:25 PST 2016

jlebar added inline comments.

================
Comment at: docs/LangRef.rst:1168
+    through the transformed program (with the same input / initial conditions)
+    must also be compatible.
+
----------------
nhaehnle wrote:
> jlebar wrote:
> > This is going to be the paragraph to wordsmith.
> > 
> > I think the approach of saying "A correct transformation has the following property: For any two runs of the program r1 and r2, if r1 and r2 were compatible before the transformation, they must be compatible after the transformation" makes some sense as a way to import SPMD semantics into the langref.
> > 
> > But I am not sure about the definition of 'compatible'.  For one thing, this doesn't work if the transformation adds or removes a callsite with convergent parameters.  But you're sometimes allowed to do this.  For example, you could unroll a loop which contains a convergent-parameter call.
> > 
> > This is really tricky...
> So, the intention is for the language to allow the addition of callsites in loop unrolling.
> 
> Compatible means that at every call site, the sequences of actual convergent function arguments are equal or one is a subset of the other.
> 
> If you add a callsite, you have to make sure that this property holds. In the case of loop unrolling, I think the constraint ends up being similar to what happens for convergent functions, i.e. as long as you know that the loop has a multiple of N iterations, you can unroll the loop N times. The effect is that the sequence of actual function arguments at the call site in the loop from run r1 on the original program is split into N sequences of actual function arguments, one for each of the N call sites in the loop-unrolled program.
> 
> If the sequences in r1 and r2 were equal originally, then they're trivially equal afterwards. If the sequence in r1 was a subsequence of the sequence in r2, then this must mean that the loop took fewer iterations in r1 than in r2. Then it follows that the sequences in r1 are still subsequences of the corresponding sequences in r2.
> 
> (I think this approach is fine for the proof that unrolling exactly N times satisfies the condition. For the reverse proof that loop transformations satisfying the condition actually do the right thing with respect to the real SIMT model, I'm cheating slightly because there could be an outer loop so that there is not necessarily a clear implication from subsequence -> fewer iterations, but then e.g. use the freedom to choose r1 and r2 in such a way that the outer loop is run through only once.)
Oh, I see what you mean now!  Okay.  My problem was I was trying to compare call sites before and after the transformation, but you only compare calls made within one piece of code (i.e. either before compared to before or after compared to after).

Now that I've slept on it (and understood my mistake here), I think that reasoning about how programs run is going to be pretty complicated -- to the point that I am still not 100% sure this is correct.  If we can agree on a data-flow formulation, I'd probably find that a lot easier to reason about myself.  It sounds like Mehdi doesn't have as much of a problem with it as you might have thought?

https://reviews.llvm.org/D26348