[PATCH] D12246: [NVPTX] change threading intrinsics from noduplicate to convergent

Fri Aug 28 15:41:24 PDT 2015

I don’t think the example code here is legal under any SPMD models I am aware of.  It’s generally not legal to have barrier operations under divergent control flow, such as divergent trip-count loops.

From the CUDA docs:

> __syncthreads() is allowed in conditional code but only if the conditional evaluates identically across the entire thread block, otherwise the code execution is likely to hang or produce unintended side effects.

—Owen

> On Aug 21, 2015, at 12:29 PM, Bjarke Hammersholt Roune <broune at google.com> wrote:
> 
> broune added a subscriber: broune.
> broune added a comment.
> 
> I think we'd need a change in loop unrolling for this. Here's an example, where the trip count is divergent:
> 
>  for (int j = 0; j <= 31 - threadIdx.x; ++j) {
>    for (int i = 0; i <= threadIdx.x; ++i) {
>      // do something
>      __syncthreads();
>    }
>  }
> 
> We can't allow unrolling of the inner loop here, since then threads that were previously able to meet up at the single syncthreads will instead be distributed among the unrolled syncthreads copies.
> 
> I think that loop unrolling will be OK if we change it so that it only unrolls loops that contain syncthreads if the trip count is known to be not divergent (i.e. convergent, but in the CUDA sense, not in the LLVM sense). Jingyue's divergence analysis pass can prove non-divergence.
> 
> 
> http://reviews.llvm.org/D12246
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150828/7d21155a/attachment.html>