[PATCH] D16941: [NVPTX] Mark nvvm synchronizing intrinsics as convergent.

Jingyue Wu via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 5 20:38:46 PST 2016


jingyue added a comment.

I was referring to this paragraph

> Barriers are executed on a per-warp basis as if all the threads in a warp are active. Thus, if any thread in a warp executes a bar instruction, it is as if all the threads in the warp have executed the bar instruction. All threads in the warp are stalled until the barrier completes, and the arrival count for the barrier is incremented by the warp size (not the number of active threads in the warp). In conditionally executed code, a bar instruction should only be used if it is known that all threads evaluate the condition identically (the warp does not diverge). Since barriers are executed on a per-warp basis, the optional thread count must be a multiple of the warp size.

> 

> Read more at: http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#ixzz3zMFDtpKf


This is related to how NVIDIA GPUs deal with divergent code. If threads in a warp diverge on a branch instruction, the threads that take one branch execute till they reach the post-dominator of the branch instruction and then other threads execute till they also reach the post-dominator. Then, all threads converge and continue.

Consider the following example for function inlining.

  void foo() {
    if (condition2) {
      __syncthreads();
    }
  }
  
  void bar() {
    if (condition1) {
      foo();
    } else {
      foo();
    }
  }

`__syncthreads` being non-divergent guarantees that threads in the same warp either

1. all reach this `__syncthreads` or
2. all miss this `__syncthreads`.

Therefore, `condition2` must be non-divergent. Moreover, `condition1` must be non-divergent too; otherwise, some of the threads enter the first call site of `foo` and got stuck there, before the other threads have a chance to enter the second call site of `foo`. In other words, not only `__syncthreads` are non-divergent, its transitive call sites are also non-divergent. Therefore, function inlining is safe.


http://reviews.llvm.org/D16941





More information about the llvm-commits mailing list