[PATCH] D12246: [NVPTX] change threading intrinsics from noduplicate to convergent

Bjarke Hammersholt Roune via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 21 12:29:28 PDT 2015


broune added a subscriber: broune.
broune added a comment.

I think we'd need a change in loop unrolling for this. Here's an example, where the trip count is divergent:

  for (int j = 0; j <= 31 - threadIdx.x; ++j) {
    for (int i = 0; i <= threadIdx.x; ++i) {
      // do something
      __syncthreads();
    }
  }

We can't allow unrolling of the inner loop here, since then threads that were previously able to meet up at the single syncthreads will instead be distributed among the unrolled syncthreads copies.

I think that loop unrolling will be OK if we change it so that it only unrolls loops that contain syncthreads if the trip count is known to be not divergent (i.e. convergent, but in the CUDA sense, not in the LLVM sense). Jingyue's divergence analysis pass can prove non-divergence.


http://reviews.llvm.org/D12246





More information about the llvm-commits mailing list