[PATCH] D26348: Allow convergent attribute for function arguments

Thu Dec 8 09:35:58 PST 2016

mehdi_amini added a comment.

In https://reviews.llvm.org/D26348#616989, @nhaehnle wrote:

> Thank you for taking a look.
>
> I mostly agree about the use of 'divergence', I think the places where it's necessary can be filled in using the language of executions. However, there's definitely precedence for talking about divergence of data in LLVM. The DivergenceAnalysis pass explicitly determines divergence vs. uniformity of data.  The purpose is ultimately to decide whether control flow is uniform or divergent, but to decide that, it obviously has to look at the divergence of data as well because branch conditions are data.

We need to clarify, there are two parts in my comments:

1. "divergence" is not common in LLVM. I stand on this point: the DivergenceAnalysis is a GPU specific pass and is hardly part of the `core semantic` of the IR.
2. "divergence" as it is usually encounter is about control flow and not data flow.

About 2, since you're pointing at the DivergenceAnalysis pass, it mentions ` This file implements divergence analysis which determines whether a branch in a GPU program is divergent.`. My understand of divergence in this context has always been a control flow divergence, which does not "break" SSA.

I mean that unless I missed something here, the usual notion of divergence does not prevent the folding we took as an example in LangRef for this patch:

  if (cond) {
       Tmp1 = Foo(v [convergent])
     } else {
       Tmp2 = Foo(v [convergent])
     }
     Tmp3 = phi [Tmp1, Tmp2]

too

  Tmp3 = Foo(v [convergent])

I'm looking at `sinking may add divergence even when the argument is the same IR value in all blocks`, here there is no control flow divergence, I don't believe a GPU programmer would say that the transformation above "could create divergence".

(Of course I understand the comment, and it makes perfect sense to me, but I have been following this patch and it is hard to judge, which is why I asked for @chandlerc or @sanjoy to provide another look at this).

================
Comment at: docs/LangRef.rst:1149
+    operation must have the same value across all simultaneously executing
+    threads. Such arguments are called ``convergent``.
+
----------------
nhaehnle wrote:
> mehdi_amini wrote:
> > I don't like this paragraph, and especially the use of "threads", which in general does not imply the lock-step model. It seems like a loose definition that makes little sense for anyone not versed into SIMT and GPU execution model.
> > 
> "threads" is literally the term used by AMD for the thing we're talking about: threads executing in lock-step within a single "wave". It would be counter-productive to deviate from that.
> 
> And yes, "thread" doesn't always imply lock-step, but the paragraph does explicitly talk about "operations that are executed simultaneously for multiple threads". Maybe this would help drive the point home even more:
> 
> > In some parallel execution models, there exist operations that are executed by a single hardware execution unit on behalf of multiple threads simultaneously and one or more parameters of the operation ...
> 
> ... but I was previously encouraged to stay away from talking about hardware in the LangRef, so I have to say that this feels increasingly like bike-shedding to me.
> 
> I'd actually be happy to use language that admits that at the end of the day, this is all about doing something useful with hardware, because I think it helps understand the intuition behind the definition. But the back-and-forth needs to converge at some point (in the limit sense, not in the attribute sense ;-)).
The fact that AMD is using "threads" is irrelevant to me: we're not writing an AMDGPU documentation here. This is why the use of `threads` alone seems wrong and misleading to me in the context of this document. (My point wouldn't stand in a GPU-centric document, but the GPU model is not part of the LangRef right now)

Otherwise, I would prefer that we spell it clearly, something along these lines: 

```
GPUs (and other similar hardware) are frequently based on a SIMT (or SPMD) model, where a single hardware execution unit executes identical operations simultaneously on behalf of multiple threads of execution. Some of these operations require that one or more arguments of the operation must have the same value across all simultaneously executing threads. Such arguments are called ``convergent``.
```

https://reviews.llvm.org/D26348