[PATCH] D12246: [NVPTX] change threading intrinsics from noduplicate to convergent

Xuetian Weng via llvm-commits llvm-commits at lists.llvm.org
Sun Aug 30 22:12:50 PDT 2015


wengxt marked an inline comment as done.
wengxt added a comment.

In http://reviews.llvm.org/D12246#235675, @broune wrote:

>   for (int i = 0; i < *bound; ++i) {
>     if (i == 0)
>       __syncthreads();
>   }
>   
>
> This input program is valid as long as *bound > 0 has the same value across the block. Here loop-unrolling by a factor of 2 will separate off the first iteration of the loop into a duplicate body for the case where *bound is odd. I checked with an example loop that's similar but that doesn't use __syncthreads() and LLVM does do unrolling by a factor of 2 in this way. If whether *bound is odd is divergent, then only part of the warp would execute the __syncthreads() in the duplicate odd-case unrolled loop body. So I think that unrolling does have to be careful with divergent trip counts for loops that include __syncthreads() in cases such as this.


IMHO, for loop, it is not possible for two thread running on different iteration to sync together, that is actually divergent. In that sense, if two function call instruction converge in the original code, they will also converge after unrolling.

Thus I don't think there is any problem in this example.


http://reviews.llvm.org/D12246





More information about the llvm-commits mailing list