[llvm-dev] [RFC] Adding thread group semantics to LangRef (motivated by GPUs)

Sat Dec 29 08:32:06 PST 2018

On 20.12.18 18:03, Connor Abbott wrote:
>     We already have the notion of "convergent" functions like
>     syncthreads(), to which we cannot add control-flow dependencies. 
>     That is, it's legal to hoist syncthreads out of an "if", but it's
>     not legal to sink it into an "if".  It's not clear to me why we
>     can't have "anticonvergent" (terrible name) functions which cannot
>     have control-flow dependencies removed from them?  ballot() would be
>     both convergent and anticonvergent.
> 
>     Would that solve your problem?
> 
> 
> I think it's important to note that we already have such an attribute, 
> although with the opposite sense - it's impossible to remove control 
> flow dependencies from a call unless you mark it as "speculatable". 

This isn't actually true. If both sides of an if/else have the same 
non-speculative function call, it can still be moved out of control flow.

That's because doing so doesn't change anything at all from a 
single-threaded perspective. Hence why I think we should model the 
communication between threads honestly.

> However, this doesn't prevent
> 
> if (...) {
> } else {
> }
> foo = ballot();
> 
> from being turned into
> 
> if (...) {
>      foo1 = ballot();
> } else {
>      foo2 = ballot();
> }
> foo = phi(foo1, foo2)
> 
> and vice versa. We have a "noduplicate" attribute which prevents 
> transforming the first into the second, but not the other way around. Of 
> course we could keep going this way and add a "nocombine" attribute to 
> complement noduplicate. But even then, there are even still problematic 
> transforms. For example, take this program, which is simplified from a 
> real game that doesn't work with the AMDGPU backend:
> 
> while (cond1 /* uniform */) {
>      ballot();
>      ...
>      if (cond2 /* non-uniform */) continue;
>      ...
> }
> 
> In SPIR-V, when using structured control flow, the semantics of this are 
> pretty clearly defined. In particular, there's a continue block after 
> the body of the loop where control flow re-converges, and the only back 
> edge is from the continue block, so the ballot is in uniform control 
> flow. But LLVM will get rid of the continue block since it's empty, and 
> re-analyze the loop as two nested loops, splitting the loop header in 
> two, producing a CFG which corresponds to this:
> 
> while (cond1 /* uniform */) {
>      do {
>          ballot();
>           ...
>      } while (cond2 /* non-uniform */);
>      ...
> }
> 
> Now, in an implementation where control flow re-converges at the 
> immediate post-dominator, this won't do the right thing anymore. In 
> order to handle it correctly, you'd effectively need to always flatten 
> nested loops, which will probably be really bad for performance if the 
> programmer actually wanted the second thing. It also makes it impossible 
> when translating a high-level language to LLVM to get the "natural" 
> behavior which game developers actually expect. This is exactly the sort 
> of "spooky action at a distance" which makes me think that everything 
> we've done so far is really insufficient, and we need to add an explicit 
> notion of control-flow divergence and reconvergence to the IR. We need a 
> way to say that control flow re-converges at the continue block, so that 
> LLVM won't eliminate it, and we can vectorize it correctly without 
> penalizing cases where it's better for control flow not to re-converge.

Well said!

Cheers,
Nicolai
-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.