[PATCH] D21162: [CUDA] Implement __shfl* intrinsics in clang headers.

Artem Belevich via cfe-commits cfe-commits at lists.llvm.org
Thu Jun 9 10:58:42 PDT 2016


tra added inline comments.

================
Comment at: lib/Headers/__clang_cuda_intrinsics.h:77-80
@@ +76,6 @@
+    _Static_assert(sizeof(__tmp) == sizeof(__in));                             \
+    memcpy(&__tmp, &__in, sizeof(__in));                                       \
+    __tmp = ::__FnName(__tmp, __offset, __width);                              \
+    double __out;                                                              \
+    memcpy(&__out, &__tmp, sizeof(__out));                                     \
+    return __out;                                                              \
----------------
Could we use a union instead?

================
Comment at: lib/Headers/__clang_cuda_intrinsics.h:87
@@ +86,3 @@
+__MAKE_SHUFFLES(__shfl_up, __builtin_ptx_shfl_up_i32, __builtin_ptx_shfl_up_f32,
+                0);
+__MAKE_SHUFFLES(__shfl_down, __builtin_ptx_shfl_down_i32,
----------------
Ugh. Took me a while to figure out why 0 is used here.
Unlike other variants shfl.up apparently applies to lanes >= maxLane. Who would have thought.
Might add a comment here so it's not mistaken for a typo.


http://reviews.llvm.org/D21162





More information about the cfe-commits mailing list