<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/141626>141626</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Regression: Cannot compile cuda code that calls CUB function with clang 20
</td>
</tr>
<tr>
<th>Labels</th>
<td>
clang
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
jabraham17
</td>
</tr>
</table>
<pre>
I am running some cuda code which makes use of CUB on an NVIDIA A100 GPU. I am compiling the code using clang++ 20 and running into issues that were not present in clang++ 19.
The following is my simplified test case, compiled as `clang++ bug.cu -I.... -L... -lcuda -lcudart`. I am using cuda 12.3.
```cuda
#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include <cub/cub.cuh>
int main() {
int deviceIdx = 0;
cudaSetDevice(deviceIdx);
int N = 100;
float* arr;
cudaMalloc(&arr, N * sizeof(float));
float* sum;
cudaMalloc(&sum, sizeof(float));
void *temp_storage = nullptr;
size_t n_temp_storage = 0;
auto res = cub::DeviceReduce::Sum(temp_storage, n_temp_storage, arr, sum, N, (CUstream)0);
if ((int)res != CUDA_SUCCESS) {
printf("Error: %d: %s\n", res, cudaGetErrorString((cudaError_t)res));
return -1;
}
cudaMalloc(&temp_storage, n_temp_storage);
auto res2 = cub::DeviceReduce::Sum(temp_storage, n_temp_storage, arr, sum, N, (CUstream)0);
if ((int)res2 != CUDA_SUCCESS) {
printf("Error: %d %s\n", res2, cudaGetErrorString((cudaError_t)res2));
return -1;
}
float sum_host;
cudaMemcpy(&sum_host, sum, sizeof(float), cudaMemcpyDeviceToHost);
printf("Sum: %f\n", sum_host);
}
```
With clang 19.1.3, the code compiles and runs, reaching the `printf("Sum: ...` line. With clang 20.1.4, `cub::DeviceReduce::Sum` returns an error and exits
```
Error: 209: no kernel image is available for execution on the device
```
If I compile the exact same code with `nvcc`, it also works with cuda 12.3. I have tested with multiple versions of cuda and got the same results. I am reporting 12.3 here because that is the newest cuda version that both clang 19 and 20 compile without warning about cuda version.
To my mind, this all points to a regression from clang 19 in LLVM's cuda support.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzMVk2P2zYQ_TX0ZbACRVle--CD15tNDSRBkc2mR4OiRhYTijRIyt701xdDyR-bbNIWvRQwTIjDeTPv8WNGhqB3FnHJyjtW3k9kH1vnl19k5WUru_x2Urn623IDsgPfW6vtDoLrEFRfS1CuRji2WrXQya8YoA8IroH10x04C9LCh8-b-80KVjnn8Pb3pwwSknLdXhvCii0OKH2gT2Wk3TFxx8QdCA7S1ueo2kYHOoQeA8RWRjiiR7Auwt5jQBtB2xf--SJjfMX46lOL0Dhj3DHhBOi-QdDd3uhGYw0RQwQlAzKxHjPDGmQANuPXeFW_y1QPN5ssyzK4eZf-TdJhGHxkMz4yHOmQMRdZMWbCZnz4kYG-RaGtMn2NwIp1iLV2WcuKN6-ZjK5etxHWzy1b39uoO_zZioqJB9VXmepPC_hK2wid1JaJORMLYLd3jK9oA6DGg1a4qZ-BFffAWZEsiecjxvtkZWJ-XsbE4rSG3D8kt5yfHRvjZGRiBdL7a7D30hinUvwZmcSafMUKgv4TXcPEfPRcnCO8xAt9N-J9D0cWsf4FEMDB6ZqiRez22xCdlztMmdvemH08Z0oY2wh2-8PCM0HZRwceQ5oltYsVK1aDUB-x7hUOM4-U1fwah5J8iUwzoxgjiQ_0x8R8_RSiR9kxseDXkjeQOM-1JYYpDZFTJuun-9X28Wm9fvP4eLXFALD32sYmuYk33jvPihUwUdbjGFi5tkwIiuwxpDvT1_ItxrT6MXpNN4ai0nya3I7RX6oM4DH23sJNfppit_evbdnfybL4Xm3xP5Fb_Ae9XxFb_Cu1xT-VO90A4rhtXYgvriF2av_tfG8G-0WPV-7Q-spvkP2T-y15nRO5Zkz7MPBtrqheQl3u9pDr-QEdJv_QsR3efHrt86wg73NJGd_ycCojYZBRqvZUediMv5JNlmVsxsFoixlchRA8y7NpOgP0gv_ycM34KDdFB6R9SWngs47hu2LA-Oq884IvaLAOvqK3aEB39KboAPIgtZGVoVrmAZ9R9VE7S3WWqAxP7o8SbRrYnJRIC_FZqghBdqNKRyLIZtwelCIvsQYdQZrg4Oj81zAsuFQy2EArD5jKJtaDtetN1HuDcEAftLOBmoDkQpx3LqbIKabH0JsYxjLpce98pO0gbGippFeoJPURqcjrkFwtHlOVJsgxxmCv3NUJSNEEP9Ol3Fwf4Sh9aiFkRV_XIKcOwVFP0GlbDweI5DYG9k7bGCA6kOBx5zGkwI133SWmtvDu3ef3TNyGATr0eyJF0JN6WdSLYiEnuMxvp_Nilpe8nLTLqSgXjaq4UKLG8hbLvMKC56iqeSmmUk30UnBR8lLc5mU5F2Ump2UzbQoxn-WqnM84m3LspDaZMYcuc343Sb3RMp_mMzGbGFmhCamrE2LsYgQ1eH5JDjdVvwtsyo0OMVwgoo4Glx_PVOkorqWlJuuk6aXvS_oraUxI_V7TW5UO5PH6xkx6b5ZtjPtAd0M8MPGw07GlhsN1TDxQ6HG42Xv3BVVk4mFo85h4GNkcluKvAAAA__9AiEsc">