[cfe-dev] [CUDA][RFC] Clang support for CUDA 11.4, issues with <cooperative_groups.h>

Johannes Doerfert via cfe-dev cfe-dev at lists.llvm.org
Thu Nov 4 10:52:44 PDT 2021


It's not a CUDA issue per se (IMHO) and not related to the builtin at all.

C++ says this ain't alright, nvcc is just very forgiving apparently:
https://godbolt.org/z/14xYfbKdz

I'd recommend to either declare the builtin yourself in you header or use
a CUDA version that comes with a declaration.

~ Johannes


On 11/4/21 02:44, Carlos Galvez via cfe-dev wrote:
> Hi!
>
> I wanted to ask what's the expected support for CUDA 11.4 in Clang? We want
> to update from 10.2 to 11.4, and came across the following issue when
> compiling:
>
> // foo.cu
> #include <cooperative_groups.h>
>
> Just by including this header, and compiling for "--cuda-gpu-arch=sm_61",
> we get the following error building with Clang:
>
> /usr/local/cuda-11.4/targets/x86_64-linux/include/cooperative_groups/details/partitioning.h:85:32:
> error: use of undeclared identifier '__match_any_sync'
>          unsigned int subMask = __match_any_sync(thisMask, pred);
>                                 ^
>
> The error goes away if we use sm_70 or later, but we need sm_61 in my
> application.
>
> Cooperative groups have been working just fine since they were introduced
> in CUDA 9, so I wonder why this would break now. NVCC has no problems with
> the above code. Clang can also build it just fine on CUDA 10.2.
>
> Digging a bit deeper, I can narrow it down to this:
>
> // foo.cu
>
> template <typename T>
> __device__ void foo()
> {
>      __match_any_sync(0,0);
> }
>
> The above code compiles fine with NVCC + sm_61. However, *if I remove the
> template*, then NVCC complains that __match_any_sync is not defined. Clang
> complains regardless.
> Does this make any sense? Is NVCC doing some magic that allows it to ignore
> undeclared functions if they are used in a non-instantiated template
> function?
>
> PS: from what I understand, __match_any_sync is indeed only available on SM
> 7.0 or later. I still wonder how NVCC can handle this, is it a bug or a
> feature? Should Clang be updated to match NVCC?
>
> PS2: The problem goes away in CUDA 11.5, even though it's not officially
> supported in Clang. I get a warning but the code compiles. I can still see
> the __match_any_sync function in the CUDA header so I don't understand
> what's happening. Is Clang really compiling, or what does it do when using
> a "too new" CUDA version?
>
> Thanks!
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


More information about the cfe-dev mailing list