[PATCH] D21162: [CUDA] Implement __shfl* intrinsics in clang headers.
Justin Lebar via cfe-commits
cfe-commits at lists.llvm.org
Thu Jun 9 12:48:45 PDT 2016
jlebar added inline comments.
================
Comment at: lib/Headers/__clang_cuda_intrinsics.h:77-80
@@ +76,6 @@
+ _Static_assert(sizeof(__tmp) == sizeof(__in)); \
+ memcpy(&__tmp, &__in, sizeof(__in)); \
+ __tmp = ::__FnName(__tmp, __offset, __width); \
+ double __out; \
+ memcpy(&__out, &__tmp, sizeof(__out)); \
+ return __out; \
----------------
tra wrote:
> Could we use a union instead?
I'm pretty sure using a union for this purpose is UB in C++. "[9.5.1] In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time" Although apparently it's fine in C11, http://stackoverflow.com/questions/25664848/unions-and-type-punning
================
Comment at: lib/Headers/__clang_cuda_intrinsics.h:87
@@ +86,3 @@
+__MAKE_SHUFFLES(__shfl_up, __builtin_ptx_shfl_up_i32, __builtin_ptx_shfl_up_f32,
+ 0);
+__MAKE_SHUFFLES(__shfl_down, __builtin_ptx_shfl_down_i32,
----------------
tra wrote:
> Ugh. Took me a while to figure out why 0 is used here.
> Unlike other variants shfl.up apparently applies to lanes >= maxLane. Who would have thought.
> Might add a comment here so it's not mistaken for a typo.
Done, thanks.
http://reviews.llvm.org/D21162
More information about the cfe-commits
mailing list