[PATCH] D12246: [NVPTX] change threading intrinsics from noduplicate to convergent

Fri Aug 28 17:15:28 PDT 2015

> On Aug 28, 2015, at 5:01 PM, Bjarke Hammersholt Roune via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> 
> broune added a comment.
> 
> In http://reviews.llvm.org/D12246#235587, @resistor wrote:
> 
>> I don’t think the example code here is legal under any SPMD models I am aware of.  It’s generally not legal to have barrier operations under divergent control flow, such as divergent trip-count loops.
>> 
>> From the CUDA docs:
>> 
>>> __syncthreads() is allowed in conditional code but only if the conditional evaluates identically across the entire thread block, otherwise the code execution is likely to hang or produce unintended side effects.
>> 
>> 
>> —Owen
> 
> 
> I assume that you're referring to my example. I agree, the input program in my example is not valid (and also the loop bounds aren't quite right), so that example doesn't show a problem. I'm still concerned about loop unrolling in a case such as this:
> 
>  for (int i = 0; i < *bound; ++i) {
>    if (i == 0)
>      __syncthreads();
>  }
> 
> This input program is valid as long as *bound > 0 has the same value across the block. Here loop-unrolling by a factor of 2 will separate off the first iteration of the loop into a duplicate body for the case where *bound is odd. I checked with an example loop that's similar but that doesn't use __syncthreads() and LLVM does do unrolling by a factor of 2 in this way. If whether *bound is odd is divergent, then only part of the warp would execute the __syncthreads() in the duplicate odd-case unrolled loop body.

This would be a violation of convergent semantics, IMO. Anything that causes a convergent instruction that was uniform to become not-uniform is, I think, duplication or not.

—escha