[llvm-dev] [RFC] Adding thread group semantics to LangRef (motivated by GPUs)

Thu Dec 20 02:56:46 PST 2018

December 20, 2018 1:15 AM, "Justin Lebar via llvm-dev" <llvm-dev at lists.llvm.org> wrote:

> We already have the notion of "convergent" functions like syncthreads(), to which we cannot add
> control-flow dependencies. That is, it's legal to hoist syncthreads out of an "if", but it's not
> legal to sink it into an "if". It's not clear to me why we can't have "anticonvergent" (terrible
> name) functions which cannot have control-flow dependencies removed from them? ballot() would be
> both convergent and anticonvergent.
> 
> Would that solve your problem?

One could fold both these constraints into the single existing "convergent" attribute, and say that the control-flow dependencies of a convergent function call cannot be trifled with. But there is more: is it okay to sink a syncthreads() call into an "if", if the "if" (heh!) is known to be uniform, i.e., all threads are guaranteed to take the same side of the branch? This may not be important for syncthreads, but it may be a useful for calls like ballot() that produce or use values. That would mean that the convergent attribute should really constrain /divergent/ control-flow dependencies. It would be nice to have a single-threaded way to say all of this.

>> However, the basic block containing the ballot call in the natural lowering to LLVM IR is not
> part of the loop at all. The information that it was intended to be run as part of the loop is
> currently lost forever.
> 
> Sounds like the natural lowering of this example is not respecting anticonvergence and just needs
> to be made more unnatural?

Right. We probably can't change the way control flow is represented in LLVM. But we could have a way to mark the extra blocks at the exit of the loop as being special. Optimizations will need to be aware of this when moving convergent functions. This is similar to using header/merge blocks in SPIR-V to demarcate the "extended" body of a loop. The tricky part is that the loop condition will now need to be a control-flow dependency for any convergent calls in those extra blocks!

> I also think it's worthwhile to consider the reasons behind nvidia's move away from functions like
> ballot() towards explicit tracking of which threads are active.

Ack for Olivier's email about *_sync and co-operative groups. Having a new programming model is quite useful. But what we are looking for is a generic way to support existing programming models in LLVM while limiting how invasive the changes will need to be.

Sameer.