<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/57178>57178</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Cuda kernel missing and wrong kernel called when compiling with clang-cl
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Danvil
      </td>
    </tr>
</table>

<pre>
    I am trying to compile a CUDA program with clang-cl under windows. The compiled binary somehow does not contain all kernels and invokes the wrong kernels. See example above:

File `bug.cu`:
```
#include <iostream>
#include "cuda.h"

__global__ void a(int* v) {
  v[0] = 0;
}

__global__ void b(int* v) {
  v[0] = 1;
}

int main() {
  int* v;
  cudaMallocManaged(&v, sizeof(int));

  a<<<1,1>>>(v);
  b<<<1,1>>>(v);
  cudaDeviceSynchronize();

  std::cout << (v[0] == 1 ? "SUCCESS" : "ERROR") << std::endl;

  cudaFree(v);

  return 0;
}
```

Compilation command line:
```
& "C:\Program Files\LLVM\bin\clang-cl.exe" -clang:--cuda-gpu-arch=sm_75 -clang:--cuda-path="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.5"  -clang:-std=c++17 bug.cu -o bug.exe -D__CUDACC_VER_MAJOR__=11 /O2 /link /LIBPATH:"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\lib\x64" cuda.lib cudart_static.lib; .\bug.exe
```

Some observations:
* Debugging this program with nsights shows that CUDA runtime only loads the first kernel "a", and launches it twice.
* If the definition of `a` (and the launch of kernel a) in the example above are removed then the example succeeds.
* If an additional dummy argument is added to kernel b, then the example also succeeds.
* It somehow looks like if multiple kernels with the same function signature appear only the first one is used.
* There don't seem to be any CUDA errors reported.

System:
* LLVM 14.0.6
* CUDA Toolkit 11.5
* NVIDA driver version 516.94
* Windows 10, 64 bit
* Graphics card 2080 TI, SM75

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyNVt1T6jgU_2vKyxk6baEFH3jAInfd0fWOqPvIpE2ArGnCJCnK_vX3nBRE9OrcmdCGnK9fzmcrw_eTa2ANeLuXeg3eQG2arVQCGJSPsylsrVlbZHiRfgO1YnrdrxW0mguLZ5qbFxfDw0Yc5ThUUjO7B2casTEvwI1woI1HBu2Z1MCUgmdhtVAOmOYg9c48I49HJS_WIIoDNYaFECBeWbMlPJXZiWgwjZJZlByecwIaFUnVruO6xc2JjvtudX-zgdS1ajmyD0ppnLeCNdHg6jM5y-qWs3iDm_emlsu1MhVTyyXsjOTAomwstY-yKeyi7AKi0WXHCPg_v0yifIamZoCQDoRoNPtOYfWHCtOvFKIwNOhg1PNB_k3t4O2I7niLkTD1LdNsLXiQKtByCU7-L8zqCOeC1pvJgzhDL3YrRYmUHNmtbLx7zw94rT_lJEgzsZO1WOx1vcFMQBzdZT7Zd55TqAfT2rQeOv0QVJ48FZyFmznFdPFYlleLBe7wZEonV_f3d_cUZPJVp-BNq9BcfbJJ-OZWiI_Ij3QrfGv1bwP-IRnDswz1wrw0mmqnoVJQUouvU7gg2CXR8_LnoSypAhz-v7l5usUX1h4-j2Uai1dBN-6HAxTs9-kS_fW27TNbb9BDrlmO8k8MW-aJeDSXzc_NZfN_nq5n11P48fMR6B6tp-bxYIx6lpgyc-oc-NqlaZwTgHcGgo9ndZRd4kpH0JUu9E3YIWDoz5ZLUlCWy6er--Xt9O-7--UShVKMZja_y-iJnnqm98315c_pw18B5Ve--RZsXgawedmBzUslK3y-FkMCHloBnoSN9UvnMWA1nWCUISaPd6i_CfQCGyGYygm7C9F2pwhjUc4EaliH3ruR7rzdaifXG-_AYR-l_sh815Ntq70kpVrtQRnGu-a5ktb5Q_OkVGEhu8vQZBVrsaawy0oP_gVrLD5BuF4FcS5WUsuQj2ZFXZXhj4qK5Imh00HEgw1GtYMdnYhnbRqYFVgPDW6D6DmLa-taCO7OITCcDJwH-0wBb5tmj2rWbSOwr6FnkEjKzNF4RVf7pJspZ35rwL8NJGXMs8NSexYgV9C0ykuSPI6k4HlS6Ri6eIVXDi7BUGiGBY4mtlvBbOf8k9uNFoSydYK_M4uDESW4waY8QgRCNHSDCpXofRdLYa2xDp21NdafZLvM2TsvmrN0oTqHdBgncXE6DIoOCQ0hjd9IlPtT4FbucF7jz9Fd8rSIL4Ynpn-7OQ5pQj4thjjC_Yn6w7LtRtYOamY5ZMk4gYdrYlzcjg6WemKSFkVSFBfZqOjxyYBfDC5Yz0uvxKTE0jkGrZHOUa5TSr2f9ahb0bfDC8Wz-5IgtrOPjl5r1WTj_dZ1PQnXGultFaMAtQS1O776WEb_iZpaERpsQ8vKR-lo3NtMstUqqVlepNWQ5YM0y9PBMC1WVT0e83GSpD3FKsyDCU4SHCQ9OcmSDC-dFukoSdM8LlaralDnNatSjifDaJgIHLsqJsOxseuenQQMWNcOiUo6705E5iiThDjqZ63fGDuZMb2TqhfQTgLUX_YD3rM">