<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/88883>88883</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            CUDA on Windows: COMDAT folding may cause wrong kernel to be launched
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          mkuron
      </td>
    </tr>
</table>

<pre>
    **Summary**

CUDA's kernel launch mechanism requires each kernel's device stub function to have a unique address. When targeting Windows, the linker defaults to performing identical COMDAT folding (ICF). Since the device stubs tend to be identical when the kernel arguments are the same, this deduplication can lead to non-unique addresses. In the end, attempting to invoke any of the kernels whose device stubs were folded into one will always invoke the first of these kernels.

**Version history**

@kpyzhov originally fixed this problem in 15e678e8438c56703c775fcb45afb25e09b31cff (Clang 9.0) by removing the COMDAT attribute from kernels and device stubs. @yxsamliu then introduced a regression in 80072fde61d40a4e8a9da673476730d34a483fa2 (Clang 14.0) that reallows COMDAT attributes on device stubs.

**Minimal working example**

```cuda
#include <stdio.h>
#include <cuda.h>

template<typename F>
__global__
void forEach(F f, int n) {
    int i = blockIdx.x*blockDim.x + threadIdx.x;
    if (i < n) {
        f(i);
    }
}

void check(cudaError_t err, unsigned line) {
    if (err != cudaSuccess) {
        printf("CUDA error in line %d: %s\n", line, cudaGetErrorName(err));
        exit(1);
 }
}
#define CHECK(err) check(err, __LINE__)

struct Test {
 void bad() {
    float *data = nullptr;

    forEach<<<1,1>>>([data] __host__ __device__ (int i) {
#ifdef __CUDA_ARCH__
        __trap(); 
#endif
    }, 1);
  }

  void good() {
    float *data;
    CHECK(cudaMalloc(reinterpret_cast<void**>(&data), sizeof(float)));
    CHECK(cudaMemset(data, 0xff, sizeof(float)));

 forEach<<<1,1>>>([data] __host__ __device__ (int i) {
        data[i] = 1.0f;
    }, 1);
    CHECK(cudaDeviceSynchronize());

    float host;
    CHECK(cudaMemcpy(&host, data, sizeof(float), cudaMemcpyDeviceToHost));
    if (host == 1.0f) {
 printf("success\n");
    } else {
 printf("failure\n");
    }

    CHECK(cudaFree(data));
 }
};


int main(int argc, char** argv)
{
    Test t;
 if (argc > 10) {
        t.bad();
    } else {
        t.good();
 }
    return 0;
}
```

**Reproduction**

Compile the above code with `clang.exe --target=x86_64-pc-windows-msvc -std=c++17 -O3 --cuda-gpu-arch=sm_75 -lcudart -fuse-ld=lld-link.exe -o test.exe test.cu` and then execute the binary without any command-line arguments. Observe a `cudaErrorLaunchFailure` coming from `cudaDeviceSynchronize()`. The issue can also be observed without `-fuse-ld=lld-link.exe`.

Now, add `-Wl,/OPT:NOICF` and observe that the error goes away. This is a fine workaround for the present bug, but obviously ICF may be desirable outside of CUDA device stubs.

For comparison, compile with NVCC (`nvcc.exe -std=c++17 -O3 -gencode arch=compute_75,code=sm_75 -cudart shared -extended-lambda -o test_nvcc.exe test.cu`) and observe that it completes successfully.

I've also posted the code to Godbolt: https://cuda.godbolt.org/z/15s87KPz6. Here you can see that Clang trunk and 8.0 generates the device stub as `define linkonce_odr dso_local ... comdat`, while Clang 11.0 generates it as `define linkonce_odr dso_local ...`.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0WF2T4jbT_TWamy4oI4MxF3MxY5bsVJLdVHbf7KVLttpY78gSkWQ-9tc_1bJhYD6e5OaZohiM1Ufdp4_kI4T3amsQ79nikS3Wd6IPrXX33XPvrLmrrDzdM_7A-MO3vuuEOw0XLFmzZHwv_m_9wPjSwzM6gxq06E3dQod1K4zyHTj8u1cOPaCo23FUDJC4VzWCD30FTW_qoKyBYKEVewQBvVF_9whCSofeT-FHiwaCcFsMymzhhzLSHjzjBYQWQSvzjA4kNqLXwRPODl1jXUeDlUQTVC00FF9_Xz98h8ZqSTcYz5-KDeOrKXxTpsaIdZWYh4BGElqFVyiHmEuL56KF2_YdmuBBuAHDiw6H3BRVKvudVrWIJdbCgEYRUY01k9tC0U_haQBHIwlChIDdLhYdLCizt88IwpzANlc5eDi01r9K_oAOY60oQZlgwRqEg9IahD6Ikz-jEUyjnA8jpr-gTq97PXT_L3Se6miVD_Y9SbB58rw7_WztHqxTW2WE1ido1BHlwMfO2UpjB8rAbIHZMsd8nub1Ilsmab1cLpq6mi9EU_EFJqsqndVNQ50qtDBbWE0TxldQncBhZ_eRlhbPjRUhOFX1AaFxtrtwI4y8YWYKbJ6cjl50WvUUb4gfZ2VfowQBDrfUDCpTGciTZMkbidlMzhMxx1yspMiW6XyZLdNEpnMxz9NG8JckZ_Mhy9CKAA6F1vbg3-TowZrbtN6y_bsyqiPJWfdMteJRdDuN77CeJcOr7qU4I6TK1LqXCCwtfJDKTluWfnrvLkVd34zvJDwtArK0CKcdGtEhbC5jynKrbSV0WQ7Xe6skNNZ9EnXLeL6BhuSrTABDXLDl4zAOAOK3Cli6hkrb-vlJHqdHxh_ixVp10yMw_gihdSjkcDO9jo56oPjiHWz6a-g-46ubMLZcj9VdPrwkXrdYPzOeExGfnLOuDIDOUQm9ibukpF0G35YSk0HngPEZVUQI3_q6Ru_fzW3nlAmUIOOctk-axjoSGuED4wvJ0gf679miMIxzSmKYu4jov2CIKX6Ju0we01y9rpb-8KgC4_ns5t47NPBUYkOTF58_Fb9eIC-kjESU5W9PXz6VJcFd8eeD6-sA39GHq2ojq5WQsdBXNDTaigCMP0gRRJSB6bXeBXfJ8mrsqKi0GF4zxosZiXB48ZwtHgmGLdZQlq31oSyhLId1VZZRKaS2myRI_I3EBsqSWlA-_Fl8Puv4TF5ZBid2Q_osfYRLJBqpmltZ8QJuSYZXGoOBj621_0zITRvPHaG-_04bSc147lCZgG7nMJS18IGlBcGPu8LACs8iFgmjAK9-oiXNxYlGtbwSzM1M2Hkk6QwYBSTHpvk3QCPc_6Rp58YMgY-Kgkk7s2nSvFnobztyW-E6TvXtZOrWWaN-4tjot7Vc2kNp_jfG6t1pID4O5AWcyXuHtGEhD0FDKt_t5xj2pi_DBkOYVO2l4BtmrvcUP-49573jzSYIqD1-ENwIpXuHHwe_puaag41DvGhm9fGmc0vw8E7t7oQyY-eF29aRpVa4Qdb01f5l57kWRdx5rlozMEYQwNJPMEvelVGYXnanf2LoEvGyft8pjYY4DL0zkLzUeCn8_IB--5D_E3fRe5A_fMdi226n9ODSRGX3CLWV5ONCC_S8J8cxxSPCZDL4Y5auj3lWZvPJrp4cBqM86fy-hokPkqXrmvFHxh9nS5h8TWEyod5Ntrt-Ihyt2bXvyuUCJpq-dwEmTe9xoilSazkhrz3MZyGgD_Fz_FD3LEui2YqWCo9YkxWjxCtlhDvFpG0fon-tbdcJE_HwxUJP4Wvl0cUzwOhm4rPut3iw2IzyzBIKJ0MUfd448KM1nSVT-N4iKO97jAZcaB89vR3mkpfEWJZ8UC6hXLfliz1Eey5lDPqhGS8Y33z94ztLH758pXPFSMY4yeAGo7OPD_ytRQ_iIE6UnPKgPAiIT2Eye8LZ3kQ_FUN2Dj2aAFW_pVmrPoCt9sr2Xp_gqdhAJ05UkESvnKg0gu2DVxLJ0keX8aHR3FhHZO6EU570V8QrUlyU2Je_ioKWE8sSs6_rofPv6miLJipzVBGh9AHL5YLxgm68KGsUlm-FQwkTPNIxC-VEi66S4iys8jLfi7poJb_hVIWYskby1OP21_Ran27qfGJ8SbKi1u-sD_E4Mi6mYOEXKyurA1mvNoSdZ-kD4xvGN9Eab4e7U-u2jG9-Mr6ZLXy-_PWPn9kUPtM562T7KC2PY1bDUSC43jzHlPNpAls06ASl-eqkCcKTjEYbRpqzpsbSSgfS21JbOndOp1MqVIoQmSjg0FKbxjPH7AZfhX8LScK-k_epXKUrcYf3s-Us5dlqxdO79j5LpcQsmzeZWCVpNkvSRjSIgkvEeY3JnbrnCZ8n81k247N8nk6zZFVJkdR5IufzPGvYPMFOKD3Vet8Rf3dxGd7neZ6nd1pUqH38_YFzg4dhjdKjZ7G-c_cUM6n6rWfzRCsf_AtKUEHjfZS2NZffA9KH12d8Whi16D3CwVmzPZ_ZhzP98HMFyrve6fvbtm9VaPtqWtuO8Q3NOv6b7Jz9f6wD45uYq2d8E2v5TwAAAP__G9o3VA">