[PATCH] D17561: [CUDA] Add conversion operators for threadIdx, blockIdx, gridDim, and blockDim to uint3 and dim3.
Justin Lebar via cfe-commits
cfe-commits at lists.llvm.org
Wed Feb 24 11:36:56 PST 2016
jlebar added inline comments.
================
Comment at: lib/Headers/cuda_builtin_vars.h:72
@@ -66,1 +71,3 @@
+ // uint3). This function is defined after we pull in vector_types.h.
+ __attribute__((device)) operator uint3() const;
private:
----------------
tra wrote:
> Considering that built-in variables are never instantiated, I wonder how it's going to work as the operator will presumably need 'this' pointing *somewhere*, even if we don't use it. Unused 'this' would probably get optimized away with optimizations on, but -O0 may cause problems.
This is interesting. In the ptx, threadIdx actually gets instantiated, as a non-weak global:
.global .align 1 .b8 threadIdx[1];
Then we take the address of this thing.
At -O2, we don't emit a threadIdx global at all.
I think this is basically fine. It's actually not right to change extern to static in the decl, because then we try to construct a __cuda_builtin_threadIdx_t, and the default constructor is deleted. :)
http://reviews.llvm.org/D17561
More information about the cfe-commits
mailing list