r370792 - [CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+
Artem Belevich via cfe-commits
cfe-commits at lists.llvm.org
Tue Sep 3 10:31:58 PDT 2019
Author: tra
Date: Tue Sep 3 10:31:58 2019
New Revision: 370792
URL: http://llvm.org/viewvc/llvm-project?rev=370792&view=rev
Log:
[CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+
vote.ballot instruction is gone in recent CUDA versions and
vote.sync.ballot can not be used because it needs a thread mask parameter.
Fortunately PTX 6.2 (introduced with CUDA-9.2) provides activemask.b32
instruction for this.
Differential Revision: https://reviews.llvm.org/D66665
Modified:
cfe/trunk/lib/Headers/__clang_cuda_intrinsics.h
Modified: cfe/trunk/lib/Headers/__clang_cuda_intrinsics.h
URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/__clang_cuda_intrinsics.h?rev=370792&r1=370791&r2=370792&view=diff
==============================================================================
--- cfe/trunk/lib/Headers/__clang_cuda_intrinsics.h (original)
+++ cfe/trunk/lib/Headers/__clang_cuda_intrinsics.h Tue Sep 3 10:31:58 2019
@@ -211,7 +211,15 @@ inline __device__ unsigned int __ballot_
return __nvvm_vote_ballot_sync(mask, pred);
}
-inline __device__ unsigned int __activemask() { return __nvvm_vote_ballot(1); }
+inline __device__ unsigned int __activemask() {
+#if CUDA_VERSION < 9020
+ return __nvvm_vote_ballot(1);
+#else
+ unsigned int mask;
+ asm volatile("activemask.b32 %0;" : "=r"(mask));
+ return mask;
+#endif
+}
inline __device__ unsigned int __fns(unsigned mask, unsigned base, int offset) {
return __nvvm_fns(mask, base, offset);
More information about the cfe-commits
mailing list