[PATCH] D26348: Allow convergent attribute for function arguments

Thu Nov 17 01:57:19 PST 2016

nhaehnle added inline comments.

================
Comment at: docs/LangRef.rst:1147
+    In some parallel execution models, there exist operations with one or more
+    arguments that must be uniform across threads. Such arguments are called
+    ``convergent``.
----------------
mehdi_amini wrote:
> `uniform` isn't defined in the LangRef.
Right, this is also used in the paragraph below. This whole paragraph isn't intended to be normative, but to provide context that allows one to understand the actual condition. Perhaps this could be phrased better. What about:

"In some parallel execution models, there exist operations that are executed simultaneously for multiple threads, and one or more arguments of the operation must have the same value across all simultaneously executing threads."

================
Comment at: docs/LangRef.rst:1161
+    convergent argument at that call site in r1 and r2, respectively, satisfy
+    that S1 is a subsequence of S2 or vice versa.
+
----------------
mehdi_amini wrote:
> It is not clear to me what is r1/S1 and r2/S2 that prevent to CSE:
> 
> ```
> if (cond) {
>   Tmp1 = Foo(v [convergent])
> } else {
>   Tmp2 = Foo(v [convergent])
> }
> Tmp3 = phi [Tmp1, Tmp2]
> ```
> 
> to:
> 
> ```
> Tmp3 = Foo(v[convergent])
> ```
> 
> But your definition reference a "call site" as if there is a one-to-one mapping before/after a transformation.
The definition of "compatible" and everything inside it is completely independent of transformations. It talks about two runs of the //same// program.

Your example is forbidden because in the original program, the runs `r1 = { cond = true, v = V0 }` and `r2 = { cond = false, v = V1 }` (with V0 != V1) are compatible (the first call-site of Foo has the sequences r1(CS) = ( V0 ) and r2(CS) = ( ), i.e. r2(CS) is a subsequence of r1(CS); similarly for the second call-site), but in the transformed program they're not (the single call-site of Foo has the sequences r1(CS) = ( V0 ) and r2(CS) = ( V1 ), and neither is a subsequence of the other).

I do think that the language I used is pretty clear on that, actually, but since both you and @jlebar seem to have had the same difficulty, I'd like to understand that better. Perhaps it would help to swap those two paragraphs around as follows?

> Program transformations must ensure that every pair of runs that is compatible with respect to convergent function attributes in the original program must also be compatible in the transformed program. Compatibility is defined as follows:
>
> Two runs r1 and r2 of a program are said to be compatible (wrt convergent function attributes) if ...

Maybe that helps the reader to parse the definition of compatibility because it //first// establishes a context where compatibility is about runs within a program rather than runs across multiple (transformed) versions of a program.

https://reviews.llvm.org/D26348