[PATCH] D26348: Allow convergent attribute for function arguments

Wed Dec 7 14:42:36 PST 2016

mehdi_amini added a comment.

`divergence` is used in multiple places in comments in the code, it isn't great as it is based on SIMT divergence. And even worse it is *data* divergence while we usually consider the control flow when talking about divergence on GPU. I don't like the fact that we spread this everywhere slowly.

I also think we should have someone else look at the LangRef (with a fresh look) change as well, the wording seems "correct" to me, but not necessarily easy to understand or easy to forge a mental model for the engineer working on optimizations on the LLVM IR. @sanjoy or @chandlerc maybe?

================
Comment at: docs/LangRef.rst:1163
+
+    Two runs r1 and r2 of a program are compatible (wrt convergent function
+    attributes) if for every call site CS with a convergent argument, the
----------------
hfinkel wrote:
> hfinkel wrote:
> > nhaehnle wrote:
> > > nhaehnle wrote:
> > > > hfinkel wrote:
> > > > > nhaehnle wrote:
> > > > > > hfinkel wrote:
> > > > > > > Please write out "with respect to." Also, you mean convergent function-parameter attributes, not function attributes.
> > > > > > > 
> > > > > > > The runs of the program terminology I find bothersome, in part because that's not what we're talking about. We're talking about the behavior of different threads within the same program. I think it would be better to say, "Two executions of the calling function, e1 and e2, are compatible..." Then we can say that transformations must preserve compatibility w.r.t. convergent function-parameter attributes for all potential executions of the calling function. We might even further specify that this is for potential simultaneous executions.
> > > > > > > 
> > > > > > > Finally, I'd really like it if you would include the example we've been discussing in the reference as a non-normative example. I did not really understand the definition until I read @nhaehnle's explanation above.
> > > > > > I'm only one person :)
> > > > > > 
> > > > > > I'm making the editing changes and adding the example.
> > > > > > 
> > > > > > s/run/execution/ is fine with me, and I'm adding that as an alternative below, but whenever I try to formulate some text that talks about executing a calling function rather than executing a function, I feel confused about the calling vs. the callee function.
> > > > > > 
> > > > > > Perhaps you can clarify:
> > > > > > 
> > > > > > > The runs of the program terminology I find bothersome, in part because that's not what we're talking about. We're talking about the behavior of different threads within the same program.
> > > > > > 
> > > > > > What's the difference between threads and executions in this context?
> > > > > > but whenever I try to formulate some text that talks about executing a calling function rather than executing a function, I feel confused about the calling vs. the callee function.
> > > > > 
> > > > > Feel free to establish some terminology early in the definition for later use. We're talking about a function-parameter attribute. That attribute must appear on the function parameters of some call site. That call site exists in some function (the caller a.k.a. the calling function) and calls some other function (the callee function).
> > > > > 
> > > > > > What's the difference between threads and executions in this context?
> > > > > 
> > > > > As I think about it, a thread is some entity that executes things (e.g. functions). Threads execute functions, and each time a thread executes some function, that constitutes an execution of that function.
> > > > > 
> > > > I see. I played around with this some more, and here's an example I'd like to hear your thoughts on:
> > > > 
> > > >   define void @foo(i32 %a) {
> > > >     ...
> > > >     call void @bar(i32 %a)
> > > >     ...
> > > >   }
> > > > 
> > > >   define void @bar(i32 %a) {
> > > >     call void @baz(i32 convergent %a)
> > > >   }
> > > > 
> > > > This code has a potential problem because at least the parameter of bar should arguably have been labelled convergent.
> > > > 
> > > > I'd say that inlining shouldn't have to worry about the convergent attribute at all. However, if the definition we're discussing here takes a function-centric view, then we cannot prove in a target-independent way that inlining bar is correct (because foo gets a new callsite with a convergent function argument, where previously there were no such callsites, i.e. //every// pair of runs of foo was compatible). If the definition takes a program-centric view (i.e., executions that start at foo implicitly also consider the sequence of arguments passed into baz inside bar), then inlining bar is clearly correct.
> > > > 
> > > > That would be an argument in favour of staying with the language about "executions of a program" rather than "executions of a function".
> > > Hmm, looking at other transforms of similar cases, I think we have to say that cases like this are undefined behaviour.
> > > 
> > > Curious, I think the same applies to the convergent attribute for functions: if a function F contains calls to a convergent function G, but is not itself marked convergent, then calls to F should be undefined behaviour, right? LangRef is currently somewhat silent on this particular point.
> > The execution of a function semantically includes the execution of any functions called, so I still feel this is the better terminology. We're talking about preventing transformations that invalidate executions of individual program executions, comparing different program executions is irrelevant, but each single program execution can contain multiple simultaneous executions of the functions in question. I think that the inlining would be correct regardless. Nothing that the inliner does will change the convergence properties of function arguments, as far as I can tell.
> > 
> > However, you also raise a second important point. If the first argument of `@bar` is not marked as convergent, then other optimizations around calls to `@bar` can ruin the convergent property regardless. Thus, it too needs to be marked.
> > 
> Regarding the convergent function attribute, yes. This is why Clang in CUDA mode marks all functions as convergent and then we remove the attribute when we prove it is not needed (in lib/Transforms/IPO/FunctionAttrs.cpp).
What is the current status on this? How do we propagate/infer this attribute?

================
Comment at: docs/LangRef.rst:1149
+    operation must have the same value across all simultaneously executing
+    threads. Such arguments are called ``convergent``.
+
----------------
I don't like this paragraph, and especially the use of "threads", which in general does not imply the lock-step model. It seems like a loose definition that makes little sense for anyone not versed into SIMT and GPU execution model.

================
Comment at: docs/LangRef.rst:1154
+    uniform across a (target-dependent) set of threads", the ``convergent``
+    attribute on function arguments can be thought of as "the value of this
+    argument at each execution of each call site to this function must be
----------------
I believe: `s/function argument/function parameter/`  ( http://llvm.org/docs/LangRef.html#parameter-attributes )
(Check the other places as well)

================
Comment at: docs/LangRef.rst:1156
+    argument at each execution of each call site to this function must be
+    uniform across a (target-dependent) set of threads".
+
----------------
This paragraph isn't providing much either, refers to a definition of the  ``convergent`` attribute on functions that is not present elsewhere in the document, and is still using "uniform".

https://reviews.llvm.org/D26348