<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/120836>120836</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[Clang] [NVPTX] [Windows] Compilation errors with CUDA NVPTX backend and MSVC headers
</td>
</tr>
<tr>
<th>Labels</th>
<td>
clang
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
blinkfrog
</td>
</tr>
</table>
<pre>
**Describe the bug**
When compiling CUDA code with Clang's NVPTX backend on Windows using MSVC headers, multiple errors occur related to the `__builtin_va_list` type not being compatible with MSVC's `va_list`. This appears to be a mismatch in how Clang and MSVC handle `va_list` when compiling CUDA device code.
**To Reproduce**
Steps to reproduce the behavior:
1. Use the following simple CUDA program (`test.cu`):
```
#include <iostream>
#include <cuda_runtime.h>
__global__ void addKernel(int *c, const int *a, const int *b) {
int i = threadIdx.x;
c[i] = a[i] + b[i];
}
int main() {
const int arraySize = 5;
int a[arraySize] = {1, 2, 3, 4, 5};
int b[arraySize] = {10, 20, 30, 40, 50};
int c[arraySize] = {0};
int *dev_a = nullptr, *dev_b = nullptr, *dev_c = nullptr;
cudaMalloc((void**)&dev_a, arraySize * sizeof(int));
cudaMalloc((void**)&dev_b, arraySize * sizeof(int));
cudaMalloc((void**)&dev_c, arraySize * sizeof(int));
cudaMemcpy(dev_a, a, arraySize * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(dev_b, b, arraySize * sizeof(int), cudaMemcpyHostToDevice);
addKernel<<<1, arraySize>>>(dev_c, dev_a, dev_b);
cudaMemcpy(c, dev_c, arraySize * sizeof(int), cudaMemcpyDeviceToHost);
std::cout << "Results: ";
for (int i = 0; i < arraySize; ++i) {
std::cout << c[i] << " ";
}
std::cout << std::endl;
cudaFree(dev_a);
cudaFree(dev_b);
cudaFree(dev_c);
return 0;
}
```
2. Compile the code using Clang with the following command (adjust paths according to your environment):
```
clang++ -std=c++14 --cuda-gpu-arch=sm_75 --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\lib\x64" -lcudart_static -ldl -lrt -pthread test.cu -o test.exe
```
3. Observe the following errors (truncated):
```
In file included from <built-in>:1:
In file included from C:\llvm\lib\clang\19\include\__clang_cuda_runtime_wrapper.h:472:
In file included from C:\llvm\lib\clang\19\include\__clang_cuda_cmath.h:16:
In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include\limits:12:
In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include\cwchar:11:
In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include\cstdio:11:
In file included from C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt\stdio.h:13:
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt\corecrt_wstdio.h:486:24: error: non-const lvalue
reference to type '__builtin_va_list' cannot bind to a value of unrelated type 'va_list' (aka 'char *')
486 | __crt_va_start(_ArgList, _Locale);
| ^~~~~~~~
C:\llvm\lib\clang\19\include\vadefs.h:39:54: note: expanded from macro '__crt_va_start'
39 | #define __crt_va_start(ap, param) __builtin_va_start(ap, param)
| ^~
In file included from <built-in>:1:
In file included from C:\llvm\lib\clang\19\include\__clang_cuda_runtime_wrapper.h:472:
In file included from C:\llvm\lib\clang\19\include\__clang_cuda_cmath.h:16:
In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include\limits:12:
In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include\cwchar:11:
In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include\cstdio:11:
In file included from C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt\stdio.h:13:
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt\corecrt_wstdio.h:488:22: error: non-const lvalue
reference to type '__builtin_va_list' cannot bind to a value of unrelated type 'va_list' (aka 'char *')
488 | __crt_va_end(_ArgList);
| ^~~~~~~~
(truncated)
```
**Expected behavior**
The program should compile without errors related to standard library headers.
**Observed behavior**
Compilation fails with the above `va_list` errors, indicating a mismatch between Clang’s built-in types and MSVC’s definitions in CUDA device mode.
**Environment:**
LLVM Clang version: 19.1.6
CUDA toolkit version: 12.4
Visual Studio version: 2022 Community Edition 17.12.3
Windows SDK version: 10.0.22621.0
Command used for compilation:
```
clang++ -std=c++14 --cuda-gpu-arch=sm_75 --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\lib\x64" -lcudart_static -ldl -lrt -pthread test.cu -o test.exe
```
**Additional context**
These errors suggest a compatibility issue between Clang’s NVPTX backend and the MSVC headers when used in device compilation. This blocks our development with Clang + CUDA on Windows. Any guidance or fixes would be greatly appreciated.
Thank you for looking into this issue.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzsWUtv4zgS_jXMpWBBoiQ_Djk4crLb6O7ZxiSd2ZtBUbTNCUUaJOUkc9jfvihSsaWk02MMZmaxQAeGYvFRb37FKjPn5FYLcUnKK1KuLljnd8Ze1krqh40124vaNM-XhC4JXa6E41bWAvxOQN1t4yhJ8fPLTmjgpt1LJfUWqq-rJXDTCHiUfgeVYnpL6MzBT_df7v4NNeMPQjdgNPwidWMeHXQO932-va9gJ1gjrCO0grZTXu6VAGGtsQ4M550FKxTzogFvgihkmq7XdSeVl3p9YGslnSfTFPzzXoA2HmqBtFE65mWteqGQV5CJTNPTrgTudtIB2-8Fsw5Z1AIYtNK1zPMdSA078xg1AqabXmSmGyXGlODxGzZpxEFyEUyTRMtFK94Z-FnsrWk6LoZ2vfViH6SwL7PR-mLHDtJYkvfrsgS-uji3MUqZR-ToZIu2C4z31mwta4HQOZmmXjif8I5MU0IXRyL4Gj8oVi41V10jgOSVNM5bwVqSX7-Z413D1rbTXrYi2fUr0uV6vVWmZmq9hoORDbCm-SisForQudQeCF1ydDE32nnoR9ibkZrQBZDZFUmXABBGJZB8BX5nBWs-NE_JE8lfpjkpryQpV2EFO77QK6j7l7iWzFZRSqTXMqnRLCNGJyGYtez5Vv4mAtHyxCxMkvLquOCFMZldZagIxUeOjwIfJXId7a7f252G7eGZh2cRnmX6hgR_h8Rp5WAxoctGHNYsLNKdUntvkW4_Xr8zzkfjQ6Lo-89MKcODAefo6T586YLQaeCGlAZGpEtw8jdhNjEQwsLFwIVnkKz_fJL8fJJDqqLl-2dC5ydFz6BTDfb-0zh_Z1YBFt4KPSQflD5H898l33M4nci8ip9sRB2PcvxEAYKJjor2In3HJsf159h2KHQU986g8K85ON8gXuVLbjoPUWwglP4sXKe8I_kS305m3BgLPeJE3EhJfhW-jjS9QpAg9EqOUeBdjgOgeZFgzDcizLv7j4NCN-qNCW-sEKegehUUg8n6e5P8tems8J3VwQAnCBxCfrqkCVQhYcVUEjJ4zMwx4YXEOU4y3LQt5kFC56z5tXMe9szvHDDOjW1whTfwbDoLQh-kNboV0eXfyjo8XhTQEzAJNlrx-JoVMJmghpPtvpswy3ckX7l2PStfxpEtyVeE0gppl9WXPuXdSCUcKauf7j-sPizhH1--BiU7j8LdGaMepCdlhWmSlNUho0mB7px8-nNIlZWSNSmrp2kkq1Ba69fOMy85TFSjYKKsh8k-5jToszNMTPwqnsRbV-UJ_Kt2wh5eZ_3-qkTo3NtOc7wpvWfuDxo26Os-mzewsabFEA23qYnU4fwvs7j526t7Cyl1aI-qRjeWVbYgZdUvJ2W1XoeJ9fDGsH60eNeyeHNYFjP6V7DiLfO7wCCbnkH_tbM_S26NMxsP99J1TMGt7xppSFnRlFJ0t2nbTkv_TMrqviJlhZEQdt6G16xICprkRZHnIyGVbGWArOwctf82sfgj3zG8WmbnOP7vE8t5pPNHxMLD8DSf4jEoq5eK4yPavqyydMQlS5M0oXRKswQnOm7xQAfWMYLyyPwvYcONFdz69eOJXzHHkKUFJrZwsvGLNnoSL6jqwFQnjtnKio2wQmOVYGL9Q-jsbXFEZ8CZDqWR1KGOYhAIgdlAp4_1VU9gsA1R_oHhIMYIhCvUDBVGCYr5FMisekmcsF6jMgeGUGc9ofP10m4_BUoVrD8ZztT4zgOj7a_-SHn9n_h3Mv85UHBgjdi4YMwcYbAsogm9CDZ92jN9DJ6WcWuizcaiz6KE-SJISGjeiI3U4q2KbI_K7ZllLV4lRrb_9ppzdP_-H1rmB5r_r_HpB5r_QPPfR_M5ojn9_0Dz-bfRXOhmhOXnAfgIvl9fTt_cbmN5fv20FxxlP_a7Bq0xuNuJY1vL7Uynmr7bFht8WG31d-FBw9B5phtmG1Cytsw-v3QbR924_mL9Dt9YITEvjYYNk8qdyiJWm8PrNmCUARFf6kZyFmqFQUexFv5RCN03Sa8pmadksXDwgtrBce7YaxysCDlIohwOpB41GNvXDcbrQfGVL4f6fPp0_7kv7w7COmk0BmW2SLJkikcDqfpY2owWYG2TLkewMJxHfIAjOsB1EwSFbJZkNMlJunw5UrerjyO6w7OULqu-vOwcQoKxvYuD9fua5kf9-EfqxxgCyyb6hSngRnvx5IehcbcT7th7d912K5wHduykS4Welc514t0oHjf80ZN4ToZ9_tgnD_6V-tQgP3q578jXyvAHB6azuEYos8dYHvy-ENq8IVpPPysksNTPsO1kwxBGjYWNfBIOHgNa1AK2VjCvnoHt91ZwiSiRvOjO9AM8my5EnTLmAX0kdfjRQbqodnLRXObNIl-wC3GZzfJiNl-kxexid1kUtKB1LnhezqaclWlZLjZ1VtNmmi0YW1zIS5rSIqM0y_IyLWZJWYoZS2fFdLrhxazMSJGKlkmV4A0pMXZ7EVheZjSd59MLxWqhXPjdhtI-7ikpVxf2EjdM6m7rSJEiCrkTCS-9Cj_2RD-VKyDlVfBR_703HL4Nca6PgWhttPFbvw59etFZdbnzfu8C2NwQerOVftfVCTctoTfh0hf_TfbW_Cq4J_Qm6OcIvelVPFzS_wYAAP__d9Ac7w">