[PATCH] D12246: [NVPTX] change threading intrinsics from noduplicate to convergent
Bjarke Hammersholt Roune via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 28 17:01:55 PDT 2015
broune added a comment.
In http://reviews.llvm.org/D12246#235587, @resistor wrote:
> I don’t think the example code here is legal under any SPMD models I am aware of. It’s generally not legal to have barrier operations under divergent control flow, such as divergent trip-count loops.
>
> From the CUDA docs:
>
> > __syncthreads() is allowed in conditional code but only if the conditional evaluates identically across the entire thread block, otherwise the code execution is likely to hang or produce unintended side effects.
>
>
> —Owen
I assume that you're referring to my example. I agree, the input program in my example is not valid (and also the loop bounds aren't quite right), so that example doesn't show a problem. I'm still concerned about loop unrolling in a case such as this:
for (int i = 0; i < *bound; ++i) {
if (i == 0)
__syncthreads();
}
This input program is valid as long as *bound > 0 has the same value across the block. Here loop-unrolling by a factor of 2 will separate off the first iteration of the loop into a duplicate body for the case where *bound is odd. I checked with an example loop that's similar but that doesn't use __syncthreads() and LLVM does do unrolling by a factor of 2 in this way. If whether *bound is odd is divergent, then only part of the warp would execute the __syncthreads() in the duplicate odd-case unrolled loop body. So I think that unrolling does have to be careful with divergent trip counts for loops that include __syncthreads() in cases such as this.
http://reviews.llvm.org/D12246
More information about the llvm-commits
mailing list