[PATCH] D12246: [NVPTX] change threading intrinsics from noduplicate to convergent
escha via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 28 17:15:28 PDT 2015
> On Aug 28, 2015, at 5:01 PM, Bjarke Hammersholt Roune via llvm-commits <llvm-commits at lists.llvm.org> wrote:
>
> broune added a comment.
>
> In http://reviews.llvm.org/D12246#235587, @resistor wrote:
>
>> I don’t think the example code here is legal under any SPMD models I am aware of. It’s generally not legal to have barrier operations under divergent control flow, such as divergent trip-count loops.
>>
>> From the CUDA docs:
>>
>>> __syncthreads() is allowed in conditional code but only if the conditional evaluates identically across the entire thread block, otherwise the code execution is likely to hang or produce unintended side effects.
>>
>>
>> —Owen
>
>
> I assume that you're referring to my example. I agree, the input program in my example is not valid (and also the loop bounds aren't quite right), so that example doesn't show a problem. I'm still concerned about loop unrolling in a case such as this:
>
> for (int i = 0; i < *bound; ++i) {
> if (i == 0)
> __syncthreads();
> }
>
> This input program is valid as long as *bound > 0 has the same value across the block. Here loop-unrolling by a factor of 2 will separate off the first iteration of the loop into a duplicate body for the case where *bound is odd. I checked with an example loop that's similar but that doesn't use __syncthreads() and LLVM does do unrolling by a factor of 2 in this way. If whether *bound is odd is divergent, then only part of the warp would execute the __syncthreads() in the duplicate odd-case unrolled loop body.
This would be a violation of convergent semantics, IMO. Anything that causes a convergent instruction that was uniform to become not-uniform is, I think, duplication or not.
—escha
More information about the llvm-commits
mailing list