[PATCH] D21162: [CUDA] Implement __shfl* intrinsics in clang headers.

Justin Lebar via cfe-commits cfe-commits at lists.llvm.org
Thu Jun 9 12:48:45 PDT 2016


jlebar added inline comments.

================
Comment at: lib/Headers/__clang_cuda_intrinsics.h:77-80
@@ +76,6 @@
+    _Static_assert(sizeof(__tmp) == sizeof(__in));                             \
+    memcpy(&__tmp, &__in, sizeof(__in));                                       \
+    __tmp = ::__FnName(__tmp, __offset, __width);                              \
+    double __out;                                                              \
+    memcpy(&__out, &__tmp, sizeof(__out));                                     \
+    return __out;                                                              \
----------------
tra wrote:
> Could we use a union instead?
I'm pretty sure using a union for this purpose is UB in C++.  "[9.5.1] In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time"  Although apparently it's fine in C11, http://stackoverflow.com/questions/25664848/unions-and-type-punning

================
Comment at: lib/Headers/__clang_cuda_intrinsics.h:87
@@ +86,3 @@
+__MAKE_SHUFFLES(__shfl_up, __builtin_ptx_shfl_up_i32, __builtin_ptx_shfl_up_f32,
+                0);
+__MAKE_SHUFFLES(__shfl_down, __builtin_ptx_shfl_down_i32,
----------------
tra wrote:
> Ugh. Took me a while to figure out why 0 is used here.
> Unlike other variants shfl.up apparently applies to lanes >= maxLane. Who would have thought.
> Might add a comment here so it's not mistaken for a typo.
Done, thanks.


http://reviews.llvm.org/D21162





More information about the cfe-commits mailing list