[PATCH] D17657: [NVPTX] Add intrinsics to support named barriers.
Justin Lebar via llvm-commits
llvm-commits at lists.llvm.org
Sat Feb 27 09:43:15 PST 2016
jlebar added a subscriber: jlebar.
Comment at: include/llvm/IR/IntrinsicsNVVM.td:743
@@ -740,1 +742,3 @@
+ def int_nvvm_barrier : GCCBuiltin<"__nvvm_bar">,
+ Intrinsic<, [llvm_i32_ty, llvm_i32_ty], [IntrNoDuplicate]>;
def int_nvvm_barrier0_popc : GCCBuiltin<"__nvvm_bar0_popc">,
I think you want convergent here, in addition to noduplicate. (I have patches which let us finally remove noduplicate for the other barriers, but I'm still waiting on reviews.)
Convergent is necessary to prevent the compiler from splitting an instruction such that some threads may run one copy, while others may run the copy. This is necessary because "barriers are executed on a per-warp basis as if all the threads in a warp are active", so if some threads are currently inactive (because we added a control-flow dependency to the convergent op), then the barrier will never complete.
I'm not 100% sure, because this is for intra-cta -- not merely intra-warp -- synchronization, but I don't think you'll need noduplicate once we can remove it elsewhere. The main way that noduplicate is stricter than convergent is that you can't inline functions that contain a noduplicate instruction (unless you're only inlining into one place), and you can't unroll loops that contain noduplicate instructions. Neither of those should be a problem here.
More information about the llvm-commits