[PATCH] D85603: IR: Add convergence control operand bundle and intrinsics
Vinod Grover via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 2 19:12:46 PST 2020
vgrover99 added a comment.
In D85603#2361168 <https://reviews.llvm.org/D85603#2361168>, @jlebar wrote:
> My understanding is that the semantics of <sm70 convergent are pretty similar to what is described in these examples. But starting in sm70+, each sync operation takes an arg specifying which threads in the warp participate in the instruction.
I believe what is described here about convergent, as best I can understand it, is the semantics of __syncthreads in CUDA. This semantics is the same for <sm70 and sm70+. Not clear whether what is described here is a "textually aligned" semantics or unaligned. __syncthreads is aligned, meaning that all threads in the threadblock must wait on the same lexical syncthreads().
I believe with sm70 the re-convergence has different semantics, due to the fact that we have forward progress guarantee in a warp. In pre-sm70 the following could deadlock
volatile int flag = 0;
if (cond) { // thread dependent conditional
while (flag == 0) ; // spin-lock
} else
flag++;
// re-convergence point
now it works as expected
The following also works (doesn't deadlock)
volatile int flag = 1;
if (cond) { // thread dependent conditional
while (flag != 0) ; // spin-lock
}
// re-convergence point
flag++;
> I admit I do not fully understand what the purpose of this is. At one point in time I thought it was to let humans write (or compilers generate) code like this, where the identity of the convergent instruction does not matter.
>
> // Warning, does not seem to work on sm75
> if (cond)
> __syncwarp(FULL_MASK);
> else
> __syncwarp(FULL_MASK);
>
> but my testcase, https://gist.github.com/50d1b5fedc926c879a64436229c1cc05, dies with an illegal-instruction error (715) when I make `cond` have different values within the warp. So, guess not?
>
> Anyway, clearly I don't fully understand the sm70+ convergence semantics. I'd ideally like someone from nvidia (hi, @wash) to speak to whether we can represent their convergent instruction semantics using this proposal. Then we should also double-check that clang can in fact generate the relevant LLVM IR.
>
> Hope this helps.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D85603/new/
https://reviews.llvm.org/D85603
More information about the llvm-commits
mailing list