[PATCH] D17561: [CUDA] Add conversion operators for threadIdx, blockIdx, gridDim, and blockDim to uint3 and dim3.

Justin Lebar via cfe-commits cfe-commits at lists.llvm.org
Wed Feb 24 11:36:56 PST 2016

jlebar added inline comments.

Comment at: lib/Headers/cuda_builtin_vars.h:72
@@ -66,1 +71,3 @@
+  // uint3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator uint3() const;
tra wrote:
> Considering that built-in variables are never instantiated, I wonder how it's going to work as the operator will presumably need 'this' pointing *somewhere*, even if we don't use it. Unused 'this' would probably get optimized away with optimizations on, but -O0 may cause problems.
This is interesting.  In the ptx, threadIdx actually gets instantiated, as a non-weak global:

  .global .align 1 .b8 threadIdx[1];

Then we take the address of this thing.

At -O2, we don't emit a threadIdx global at all.

I think this is basically fine.  It's actually not right to change extern to static in the decl, because then we try to construct a __cuda_builtin_threadIdx_t, and the default constructor is deleted.  :)


More information about the cfe-commits mailing list