[cfe-dev] [CUDA][RFC] Clang support for CUDA 11.4, issues with <cooperative_groups.h>

Thu Nov 4 11:20:28 PDT 2021

Thanks for the reply! Indeed declaring the function myself somewhere is
probably the easiest workaround.

I asked in the Nvidia forums
<https://forums.developer.nvidia.com/t/cuda-11-4-cooperative-groups-no-longer-supported-on-sm-7-0/194001/2>
and
it seems they forgot to guard the problematic code inside an #ifdef that is
only active for SM 7.x or later. I can see they've fixed it in CUDA 11.5,
and indeed using CUDA 11.5 with Clang solves the problem (modulo Clang
warning due to "too new CUDA version").

Therefore my next question is: would it be possible to bump the supported
CUDA version in Clang to 11.5? I could do it myself following the steps
done for 11.4, but I'm not sure what things I should be looking out for or
how to test it. Let me know if I should open a new thread for this new
question.

/Carlos

On Thu, Nov 4, 2021, 18:52 Johannes Doerfert <johannesdoerfert at gmail.com>
wrote:

> It's not a CUDA issue per se (IMHO) and not related to the builtin at all.
>
> C++ says this ain't alright, nvcc is just very forgiving apparently:
> https://godbolt.org/z/14xYfbKdz
>
> I'd recommend to either declare the builtin yourself in you header or use
> a CUDA version that comes with a declaration.
>
> ~ Johannes
>
>
> On 11/4/21 02:44, Carlos Galvez via cfe-dev wrote:
> > Hi!
> >
> > I wanted to ask what's the expected support for CUDA 11.4 in Clang? We
> want
> > to update from 10.2 to 11.4, and came across the following issue when
> > compiling:
> >
> > // foo.cu
> > #include <cooperative_groups.h>
> >
> > Just by including this header, and compiling for "--cuda-gpu-arch=sm_61",
> > we get the following error building with Clang:
> >
> >
> /usr/local/cuda-11.4/targets/x86_64-linux/include/cooperative_groups/details/partitioning.h:85:32:
> > error: use of undeclared identifier '__match_any_sync'
> >          unsigned int subMask = __match_any_sync(thisMask, pred);
> >                                 ^
> >
> > The error goes away if we use sm_70 or later, but we need sm_61 in my
> > application.
> >
> > Cooperative groups have been working just fine since they were introduced
> > in CUDA 9, so I wonder why this would break now. NVCC has no problems
> with
> > the above code. Clang can also build it just fine on CUDA 10.2.
> >
> > Digging a bit deeper, I can narrow it down to this:
> >
> > // foo.cu
> >
> > template <typename T>
> > __device__ void foo()
> > {
> >      __match_any_sync(0,0);
> > }
> >
> > The above code compiles fine with NVCC + sm_61. However, *if I remove the
> > template*, then NVCC complains that __match_any_sync is not defined.
> Clang
> > complains regardless.
> > Does this make any sense? Is NVCC doing some magic that allows it to
> ignore
> > undeclared functions if they are used in a non-instantiated template
> > function?
> >
> > PS: from what I understand, __match_any_sync is indeed only available on
> SM
> > 7.0 or later. I still wonder how NVCC can handle this, is it a bug or a
> > feature? Should Clang be updated to match NVCC?
> >
> > PS2: The problem goes away in CUDA 11.5, even though it's not officially
> > supported in Clang. I get a warning but the code compiles. I can still
> see
> > the __match_any_sync function in the CUDA header so I don't understand
> > what's happening. Is Clang really compiling, or what does it do when
> using
> > a "too new" CUDA version?
> >
> > Thanks!
> >
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211104/ae4b2c1d/attachment-0001.html>