[PATCH] D12246: [NVPTX] change threading intrinsics from noduplicate to convergent
Bjarke Hammersholt Roune via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 21 12:29:28 PDT 2015
broune added a subscriber: broune.
broune added a comment.
I think we'd need a change in loop unrolling for this. Here's an example, where the trip count is divergent:
for (int j = 0; j <= 31 - threadIdx.x; ++j) {
for (int i = 0; i <= threadIdx.x; ++i) {
// do something
__syncthreads();
}
}
We can't allow unrolling of the inner loop here, since then threads that were previously able to meet up at the single syncthreads will instead be distributed among the unrolled syncthreads copies.
I think that loop unrolling will be OK if we change it so that it only unrolls loops that contain syncthreads if the trip count is known to be not divergent (i.e. convergent, but in the CUDA sense, not in the LLVM sense). Jingyue's divergence analysis pass can prove non-divergence.
http://reviews.llvm.org/D12246
More information about the llvm-commits
mailing list