[PATCH] D21162: [CUDA] Implement __shfl* intrinsics in clang headers.
Artem Belevich via cfe-commits
cfe-commits at lists.llvm.org
Thu Jun 9 10:58:42 PDT 2016
tra added inline comments.
================
Comment at: lib/Headers/__clang_cuda_intrinsics.h:77-80
@@ +76,6 @@
+ _Static_assert(sizeof(__tmp) == sizeof(__in)); \
+ memcpy(&__tmp, &__in, sizeof(__in)); \
+ __tmp = ::__FnName(__tmp, __offset, __width); \
+ double __out; \
+ memcpy(&__out, &__tmp, sizeof(__out)); \
+ return __out; \
----------------
Could we use a union instead?
================
Comment at: lib/Headers/__clang_cuda_intrinsics.h:87
@@ +86,3 @@
+__MAKE_SHUFFLES(__shfl_up, __builtin_ptx_shfl_up_i32, __builtin_ptx_shfl_up_f32,
+ 0);
+__MAKE_SHUFFLES(__shfl_down, __builtin_ptx_shfl_down_i32,
----------------
Ugh. Took me a while to figure out why 0 is used here.
Unlike other variants shfl.up apparently applies to lanes >= maxLane. Who would have thought.
Might add a comment here so it's not mistaken for a typo.
http://reviews.llvm.org/D21162
More information about the cfe-commits
mailing list