[PATCH] D85603: IR: Add convergence control operand bundle and intrinsics

Mon Nov 2 19:12:46 PST 2020

vgrover99 added a comment.

In D85603#2361168 <https://reviews.llvm.org/D85603#2361168>, @jlebar wrote:

> My understanding is that the semantics of <sm70 convergent are pretty similar to what is described in these examples.  But starting in sm70+, each sync operation takes an arg specifying which threads in the warp participate in the instruction.

I believe what is described here about convergent, as best I can understand it, is the semantics of __syncthreads in CUDA. This semantics is the same for <sm70 and sm70+. Not clear whether what is described here is a "textually aligned" semantics or unaligned. __syncthreads is aligned, meaning that all threads in the threadblock must wait on the same lexical syncthreads().

I believe with sm70 the re-convergence has different semantics, due to the fact that we have forward progress guarantee in a warp. In pre-sm70 the following could deadlock

volatile int flag = 0;

if (cond) { // thread dependent conditional

  while (flag == 0) ; // spin-lock

} else

  flag++;

// re-convergence point

now it works as expected

The following also works (doesn't deadlock)

volatile int flag = 1;

if (cond) { // thread dependent conditional

  while (flag != 0) ; // spin-lock

} 
// re-convergence point
flag++;

> I admit I do not fully understand what the purpose of this is.  At one point in time I thought it was to let humans write (or compilers generate) code like this, where the identity of the convergent instruction does not matter.
>
>   // Warning, does not seem to work on sm75
>   if (cond)
>     __syncwarp(FULL_MASK);
>   else
>     __syncwarp(FULL_MASK);
>
> but my testcase, https://gist.github.com/50d1b5fedc926c879a64436229c1cc05, dies with an illegal-instruction error (715) when I make `cond` have different values within the warp.  So, guess not?
>
> Anyway, clearly I don't fully understand the sm70+ convergence semantics.  I'd ideally like someone from nvidia (hi, @wash) to speak to whether we can represent their convergent instruction semantics using this proposal.  Then we should also double-check that clang can in fact generate the relevant LLVM IR.
>
> Hope this helps.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85603/new/

https://reviews.llvm.org/D85603