[PATCH] D26348: Allow convergent attribute for function arguments

Thu Dec 8 03:37:00 PST 2016

nhaehnle added a comment.

Thank you for taking a look.

I mostly agree about the use of 'divergence', I think the places where it's necessary can be filled in using the language of executions. However, there's definitely precedence for talking about divergence of data in LLVM. The DivergenceAnalysis pass explicitly determines divergence vs. uniformity of data. The purpose is ultimately to decide whether control flow is uniform or divergent, but to decide that, it obviously has to look at the divergence of data as well because branch conditions are data.

================
Comment at: docs/LangRef.rst:1163
+
+    Two runs r1 and r2 of a program are compatible (wrt convergent function
+    attributes) if for every call site CS with a convergent argument, the
----------------
mehdi_amini wrote:
> hfinkel wrote:
> > hfinkel wrote:
> > > nhaehnle wrote:
> > > > nhaehnle wrote:
> > > > > hfinkel wrote:
> > > > > > nhaehnle wrote:
> > > > > > > hfinkel wrote:
> > > > > > > > Please write out "with respect to." Also, you mean convergent function-parameter attributes, not function attributes.
> > > > > > > > 
> > > > > > > > The runs of the program terminology I find bothersome, in part because that's not what we're talking about. We're talking about the behavior of different threads within the same program. I think it would be better to say, "Two executions of the calling function, e1 and e2, are compatible..." Then we can say that transformations must preserve compatibility w.r.t. convergent function-parameter attributes for all potential executions of the calling function. We might even further specify that this is for potential simultaneous executions.
> > > > > > > > 
> > > > > > > > Finally, I'd really like it if you would include the example we've been discussing in the reference as a non-normative example. I did not really understand the definition until I read @nhaehnle's explanation above.
> > > > > > > I'm only one person :)
> > > > > > > 
> > > > > > > I'm making the editing changes and adding the example.
> > > > > > > 
> > > > > > > s/run/execution/ is fine with me, and I'm adding that as an alternative below, but whenever I try to formulate some text that talks about executing a calling function rather than executing a function, I feel confused about the calling vs. the callee function.
> > > > > > > 
> > > > > > > Perhaps you can clarify:
> > > > > > > 
> > > > > > > > The runs of the program terminology I find bothersome, in part because that's not what we're talking about. We're talking about the behavior of different threads within the same program.
> > > > > > > 
> > > > > > > What's the difference between threads and executions in this context?
> > > > > > > but whenever I try to formulate some text that talks about executing a calling function rather than executing a function, I feel confused about the calling vs. the callee function.
> > > > > > 
> > > > > > Feel free to establish some terminology early in the definition for later use. We're talking about a function-parameter attribute. That attribute must appear on the function parameters of some call site. That call site exists in some function (the caller a.k.a. the calling function) and calls some other function (the callee function).
> > > > > > 
> > > > > > > What's the difference between threads and executions in this context?
> > > > > > 
> > > > > > As I think about it, a thread is some entity that executes things (e.g. functions). Threads execute functions, and each time a thread executes some function, that constitutes an execution of that function.
> > > > > > 
> > > > > I see. I played around with this some more, and here's an example I'd like to hear your thoughts on:
> > > > > 
> > > > >   define void @foo(i32 %a) {
> > > > >     ...
> > > > >     call void @bar(i32 %a)
> > > > >     ...
> > > > >   }
> > > > > 
> > > > >   define void @bar(i32 %a) {
> > > > >     call void @baz(i32 convergent %a)
> > > > >   }
> > > > > 
> > > > > This code has a potential problem because at least the parameter of bar should arguably have been labelled convergent.
> > > > > 
> > > > > I'd say that inlining shouldn't have to worry about the convergent attribute at all. However, if the definition we're discussing here takes a function-centric view, then we cannot prove in a target-independent way that inlining bar is correct (because foo gets a new callsite with a convergent function argument, where previously there were no such callsites, i.e. //every// pair of runs of foo was compatible). If the definition takes a program-centric view (i.e., executions that start at foo implicitly also consider the sequence of arguments passed into baz inside bar), then inlining bar is clearly correct.
> > > > > 
> > > > > That would be an argument in favour of staying with the language about "executions of a program" rather than "executions of a function".
> > > > Hmm, looking at other transforms of similar cases, I think we have to say that cases like this are undefined behaviour.
> > > > 
> > > > Curious, I think the same applies to the convergent attribute for functions: if a function F contains calls to a convergent function G, but is not itself marked convergent, then calls to F should be undefined behaviour, right? LangRef is currently somewhat silent on this particular point.
> > > The execution of a function semantically includes the execution of any functions called, so I still feel this is the better terminology. We're talking about preventing transformations that invalidate executions of individual program executions, comparing different program executions is irrelevant, but each single program execution can contain multiple simultaneous executions of the functions in question. I think that the inlining would be correct regardless. Nothing that the inliner does will change the convergence properties of function arguments, as far as I can tell.
> > > 
> > > However, you also raise a second important point. If the first argument of `@bar` is not marked as convergent, then other optimizations around calls to `@bar` can ruin the convergent property regardless. Thus, it too needs to be marked.
> > > 
> > Regarding the convergent function attribute, yes. This is why Clang in CUDA mode marks all functions as convergent and then we remove the attribute when we prove it is not needed (in lib/Transforms/IPO/FunctionAttrs.cpp).
> What is the current status on this? How do we propagate/infer this attribute?
See the paragraph starting at line 1198 ("Calling a function F that contains a call-site CS..."). Also, to quote myself about what that paragraph has to say:

> this is fairly restrictive right now. E.g. one could
> ask what happens if the called function F passes a value that was loaded
> from memory to a convergent parameter. At least for AMDGPU purposes it's
> fine to leave that undefined, and probably will remain so for a long time.

We currently have no need to actually either infer or remove the attribute, but while the paragraph doesn't explicitly state it, it contains all the definitional power to work with and without a flow that does what is done for the convergent function attribute in some compilation flows: add it everywhere in the beginning, then remove it in places where it can be proved to be unneeded.

================
Comment at: docs/LangRef.rst:1149
+    operation must have the same value across all simultaneously executing
+    threads. Such arguments are called ``convergent``.
+
----------------
mehdi_amini wrote:
> I don't like this paragraph, and especially the use of "threads", which in general does not imply the lock-step model. It seems like a loose definition that makes little sense for anyone not versed into SIMT and GPU execution model.
> 
"threads" is literally the term used by AMD for the thing we're talking about: threads executing in lock-step within a single "wave". It would be counter-productive to deviate from that.

And yes, "thread" doesn't always imply lock-step, but the paragraph does explicitly talk about "operations that are executed simultaneously for multiple threads". Maybe this would help drive the point home even more:

> In some parallel execution models, there exist operations that are executed by a single hardware execution unit on behalf of multiple threads simultaneously and one or more parameters of the operation ...

... but I was previously encouraged to stay away from talking about hardware in the LangRef, so I have to say that this feels increasingly like bike-shedding to me.

I'd actually be happy to use language that admits that at the end of the day, this is all about doing something useful with hardware, because I think it helps understand the intuition behind the definition. But the back-and-forth needs to converge at some point (in the limit sense, not in the attribute sense ;-)).

================
Comment at: docs/LangRef.rst:1154
+    uniform across a (target-dependent) set of threads", the ``convergent``
+    attribute on function arguments can be thought of as "the value of this
+    argument at each execution of each call site to this function must be
----------------
mehdi_amini wrote:
> I believe: `s/function argument/function parameter/`  ( http://llvm.org/docs/LangRef.html#parameter-attributes )
> (Check the other places as well)
Ok, will do.

================
Comment at: docs/LangRef.rst:1156
+    argument at each execution of each call site to this function must be
+    uniform across a (target-dependent) set of threads".
+
----------------
mehdi_amini wrote:
> This paragraph isn't providing much either, refers to a definition of the  ``convergent`` attribute on functions that is not present elsewhere in the document, and is still using "uniform".
Ok, let's just kill it. Of all the fuss about this definition, the one thing that this paragraph was really meant to address hasn't ever come up, so... (although, perhaps it hasn't come up precisely because the paragraph served its purpose :))

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:1443
+    // because there may be divergence in the argument that was removed by the
+    // control flow conditions for those blocks.
+    if (isa<CallInst>(I0) || isa<InvokeInst>(I0)) {
----------------
jlebar wrote:
> I don't understand the "because" clause here; can we rephrase?
Yes, let's rephrase using the language of executions:

  // We cannot sink calls if any parameter is convergent: two executions that go
  // through different predecessor blocks are compatible with respect to
  // convergent parameters at the considered call-sites before sinking even if
  // the concrete values passed to the called function are different. After sinking,
  // they are no longer compatible, because different concrete values are passed
  // to the called function at the same call-site.

Or the strategically better variant:

  // We cannot sink calls if any parameter is convergent.

https://reviews.llvm.org/D26348