[libcxx-commits] [libcxx] [libc++] Protect the libc++ implementation from CUDA SDK's `__noinline__` macro (PR #73838)
David Olsen via libcxx-commits
libcxx-commits at lists.llvm.org
Sun Jan 14 15:27:44 PST 2024
dkolsen-pgi wrote:
The easiest workaround that I can think of is to use `noinline` instead of `__noinline__` when compiling CUDA code. To do that, just change this code in `__config`:
```
# if __has_attribute(__noinline__)
# define _LIBCPP_NOINLINE __attribute__((__noinline__))
# else
# define _LIBCPP_NOINLINE
# endif
```
to:
```
# if defined(__CUDACC__) || defined(__CUDA_ARCH__) || defined(__CUDA_LIBDEVICE__)
# define _LIBCPP_NOINLINE_ATTR_NAME noinline
# else
# define _LIBCPP_NOINLINE_ATTR_NAME __noinline__
# endif
# if __has_attribute(_LIBCPP_NOINLINE_ATTR_NAME)
# define _LIBCPP_NOINLINE __attribute__((_LIBCPP_NOINLINE_ATTR_NAME))
# else
# define _LIBCPP_NOINLINE
# endif
```
(The condition in the first `#if` matches the condition in CUDA's `crt/host_defines.h` where `__noinline__` is defined as a macro.)
No pushing/popping of macros or creating wrapper headers. No maintenance burden going forward if `_LIBCPP_NOINLINE` is used in more places.
This is just a workaround, not a fix. Compilation can still fail if CUDA programs also define `noinline` as a macro. But I expect that to be extremely rare, because defining the macro `noinline` would likely break the CUDA headers.
I have no objection to withholding any workaround until NVIDIA shows some interesting in improving the situation on the CUDA side of things. But if there is progress in that area and libc++ wants to check in a workaround on their side, I think this is the way to go.
(Full disclosure: I work for NVIDIA, though not in the CUDA or NVCC organizations, so I don't have any direct say in what CUDA does. I am working with @jrhemstad on "communicating with the relevant internal teams to come to a better solution that everyone can be happy with.")
https://github.com/llvm/llvm-project/pull/73838
More information about the libcxx-commits
mailing list