<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/54560>54560</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            CUDA: Incorrect linkage with -fgpu-rdc on kernels created from lambdas inside anonymous namespaces
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          mkuron
      </td>
    </tr>
</table>

<pre>
    **Summary**

In the example below, the PTX assembly generated by Clang declares the kernel as `.weak .entry _ZN6thrust8cuda_cub4core13_kernel_agentINS0_14__parallel_for16ParallelForAgentINS0_11__transform17unary_transform_fINS_17counting_iteratorIiNS_11use_defaultES8_S8_EEPiNS5_14no_stencil_tagEZN12_GLOBAL__N_15Thing4calcESA_iNSC_5StuffEEUliE_NS5_21always_true_predicateEEElEESH_lEEvT0_T1_`. The same example compiled with NVCC generates `.weak .entry _ZN6thrust8cuda_cub4core13_kernel_agentINS0_14__parallel_for16ParallelForAgentINS0_11__transform17unary_transform_fINS_17counting_iteratorIiNS_11use_defaultES8_S8_EEPiNS5_14no_stencil_tagEZN60_GLOBAL__N__36_tmpxft_00006b02_00000000_6_b_cpp1_ii_968400945Thing4calcESA_iNSC_5StuffEEUliE_NS5_21always_true_predicateEEElEESH_lEEvT0_T1_`. In both cases, external weak linkage is used, which is not necessary since it is coming from inside a anonymous namespace. With NVCC, this is not a problem because the anonymous namespace is mangled to a unique name (`60_GLOBAL__N__36_tmpxft_00006b02_00000000_6_b_cpp1_ii_96840094`). With Clang however, the name is not unique (mangled to [`12_GLOBAL__N_1`](https://github.com/llvm/llvm-project/blob/3251ba2d0fcf5223fce3e270b91c54f548664b4e/clang/lib/AST/ItaniumMangle.cpp#L1491)). This is a problem when passing the resulting object files to `nvlink`, which will report `nvlink fatal error: Internal error: duplicate parameter bank data not same size` or `nvlink error: Duplicate weak parameter bank for ...` depending on the CUDA version. To me, it seems like internal linkage (`.entry`) instead of weak linkage (`.weak .entry`) should be used in this case.

**Versions**

I reproduced the bug with multiple Clang versions between 12.0.0 and 14.0.0. Before abd8cd9199d1e14cae961e1067b78df7044179a3 by @yxsamliu, Clang would generate `.visible .entry` instead of `.weak .entry`, which isn't any better and actually causes the example below to fail earlier on `Multiple definition of '_ZN6thrust8cuda_cub3cub11EmptyKernelIvEEvv'()`.

**Potential workaround**

It seems like the Clang option `-funique-internal-linkage-names` should be usable as a workaround that forces the symbol names to be unique, however across all Clang versions this just gives me various internal compiler errors. But that's a different issue.

**Minimal working example**

This example uses Thrust, which makes the symbol names very lengthy, but I am pretty sure the exact same behavior can also be observed by replacing `thrust::transform` with a hand-written kernel. The lambda capture seems to be important as just putting a kernel into an anonymous namespace is not sufficient to trigger the problem.

a.cu:
```cuda
#include <thrust/transform.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/execution_policy.h>

namespace {
    struct Stuff {
        long c[2];
    };

    struct Thing {
        void calc(int *data, int n, Stuff s) {
            auto f = [s] __device__ (int i) -> int {
                return 2*i;
            };
            
            auto first = thrust::counting_iterator<int>(0);
            auto last = first + n;
            thrust::transform(thrust::device, first, last, data, f);
        }

    };
}

void runA(int * data, int n) {
    Thing t;
    Stuff s({0, 0});
    t.calc(data, n, s);
}
```

b.cu:
```cuda
#include <thrust/transform.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/execution_policy.h>

namespace {
    struct Stuff {
        long c[4];
    };

    struct Thing {
        void calc(int *data, int n, Stuff s) {
            auto f = [s] __device__ (int i) -> int {
                return 2*i;
            };
            
            auto first = thrust::counting_iterator<int>(0);
            auto last = first + n;
            thrust::transform(thrust::device, first, last, data, f);
        }

    };
}

void runB(int * data, int n) {
    Thing t;
    Stuff s({0, 0, 0, 0});
    t.calc(data, n, s);
}
```

Compile and link these as follows:
```bash
clang++ -O3 -std=c++14 --cuda-gpu-arch=sm_70 -fPIC -c a.cu -fgpu-rdc -o a.o
clang++ -O3 -std=c++14 --cuda-gpu-arch=sm_70 -fPIC -c b.cu -fgpu-rdc -o b.o
nvcc -O3 -Xcompiler -fPIC -gencode=arch=compute_70,code=[sm_70,compute_70] -dlink a.o b.o -o device.o
clang++ -std=c++14 -fPIC -O3 -shared -o ab.so a.o b.o device.o
```
The second-to-last command will fail with the nvlink error given in the summary.

For comparison, compile and link with NVCC:
```bash
nvcc -O3 -std=c++14 --expt-extended-lambda -gencode=arch=compute_70,code=[sm_70,compute_70] -Xcompiler -fPIC -c a.cu -dc -o a.o
nvcc -O3 -std=c++14 --expt-extended-lambda -gencode=arch=compute_70,code=[sm_70,compute_70] -Xcompiler -fPIC -c b.cu -dc -o b.o
nvcc -O3 -Xcompiler -fPIC -gencode=arch=compute_70,code=[sm_70,compute_70] -dlink a.o b.o -o device.o
nvcc -O3 -Xcompiler -fPIC -gencode=arch=compute_70,code=[sm_70,compute_70] -shared a.o b.o device.o -o ab.so
```
This will succeed.

**LLVM IR**

Unfortunately, Thrust is currently broken on clang trunk on godbolt.org, so I cannot post a link to the LLVM IR. Running clang locally, I can see that the IR uses `define weak_odr void @_ZN6thrust8cuda_cub4core13_kernel_agentINS0_14__parallel_for16ParallelForAgentINS0_11__transform17unary_transform_fINS_17counting_iteratorIiNS_11use_defaultES8_S8_EEPiNS5_14no_stencil_tagEZN12_GLOBAL__N_15Thing4calcESA_iNSC_5StuffEEUliE_NS5_21always_true_predicateEEElEESH_lEEvT0_T1_(%...`. My guess is that it should probably use some variation of `private` or `internal` instead of `weak_odr`.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJztWVlv4zgS_jXOC2FDku-HPORw7wbb3dPopHsH8yJQFGVzIolekrLj-fX7FSn57lkMZhp7jeA4EkUW6_jqYDnT-e62l9zh89xUFTe78NCLHntR-_1UM7eSTL7xal1KlslSb3vJgx_89PIj49bKKit3bClrabiTOct27KHk9ZLlUpTcSOsnv0pTyxLzWW8SDbaSv7KBrJ3ZsfSnjxO3Mo11M9HkPBVNNhLayHiYhkUpB3H39PE5SuNRmq654WWJ4UKbePKpfXqnzd1hWpymzvDaYkoVT5sash0G0gKT0ngqdFM7VS9T5Yh1bZ4UjceNlWkuC96UbvE8S_FZLD7h1Rjb1zq1TtZClanjy8VPH-Mk_cv7H-7v3qfpxzQev6xAcCR4KRbPdykWPaTjZ9cUxWLxpVSLlKgkMS-3fGfBUSPTtZG5EtDcYrEoF4vnv6b43rxE6UuckqrYC7RneXUwgtDVWpXQ9Fa5Ffv49eFhr_3_Ne1OoiPtpsNJ6qr1W-HSCNckixJ_Q1c6SbNUrNdxqlQ6n8xGUTQffQ9rwCEyDbULbqUlT5BvkK_mJfNaL1X9CoUyZRkEzWnCdqXEigZq7VgthbQWCmNW1QLzHL2BRcEoK4yumKqtyiXjjNe63lW6wUJY3665kAP2987kwQmxtiXM2drorJQVfFRw7O297goNWlDBPQlATmNdU6t_NNJPYL1kBil_n9ZBoJfMW1ZDJFjprdxI0wUOv1XLd7s7Nj5iqje-B5VT1yKy40fMWzm3tr0hItU7fJbYpckG0CAeynLT_etDHT9L4fCYlTrDv2EyjjOe5FEhinGSDAshhzKZRtk8FuNRMR7NJpNRNpKYKohrIqRo4d3zC76fHK9VU33wXA4gdC8Zvo9H8xjCenlfWmscLLFdyZqtESLJuCQ4giFgT086I-ZYAT-2XuJJVG8IPF57HWi2qiyxaK2NO8xgBXeAmzRGG6gBiGwBuB_Jm3XpQczImyuJ9yzjWJljpde6jydW_SJBlWlzRHxP5HFPxAP7jBI8nQ0GA1qey7Wscy9USBYPXx7vGMxtla6hFs0qSSIB6lbKysJHXmH-juvOYwL0QtQKECJXcJLnTBenztVOPQpz7QK70k2JDCS982F9cBHy1cFxVgt57mtg0V7JeqR0o_NGEB4hUtYsQ7StyH4UhAOwWykttnRbCWvHySAaRPC7nMUjuh2wewllwRWzfCbyeTyf57GMEZTkfIKbaDLNprO8mEajUTyd8yHlz94o2r3BRqVqSHNhr62XrYv0PtBvlFVA2kEJxyq7pqJDNKp7yRRRo94R62RWYpkL1yDe75iPIPYy9RNUC66ANW5KhVUwOQh_6LSCwK5q5aASz0EyvZJ-hviL40W1dru_-Qz0tEGI3WAy2RXOBL4vjfVJIzE4RXFWm1dukF3yK4Y7wZgHo9edXnueQLpfhIjT7wDYb1HV9xGSdHiMIk7q5eTUh21BlzvyANGqyO6qTJchxJKGaKXfhBTexj7o1mgLQvDoM-x4jP4MFbGl2oACXHPDjaKovfeSNueb4J8WqGqc5wNqI-5yVRTSQEMwrW2uof0DDFO16iNnbe16qUMfxzqrexi8eAMe0FPx12uSQ6AdK2W9dKsdTc7A4hPjFeIhIIaM1xjZQUq0MSiTK75RCCaC19CN9crTmZVmE0pJOGLJBTEM0wQkUewf3u0LDjKZ903OVgBxf2sUAF23FWeon0peZTnHJmtHTASMBEupisIrh-Z4a4V143yM5l3RCisgUdbfyqY-oKKoUEKRATDXGbVcwlgkbJsMTizCB6IhKcIQHNN_yEE6iw1RHZQN6oDe8KGVOnm3F3mw6g0Xvzq1K7oomZ0XYv96tXyToiGPSdcaSWB3vMJ_H-TvTe_DEMNlUUPBsr7EOn1DV6mhVIHcnlAiHx697U0PzxfUfBF3SW2jVc6otEPUgIEQbO4ovflMg8eabgIjllLDxXq6eEPxDKI_UslhwRZLUZxulJBpylrCipb3oQBP9yoduoDxxtQsAR_qRLjuOhbyZPwbbCkDLBJrx6i_MCbMBrbIPMmMcuDVLTzBkrf0WsrJPZR0bfZ1L0tmx-NBSaRjT41uiD7976xQXOOGlHBm5hPjn772NjZNfXewMTsz8pllA1jcyb57FMww1SfBiPY5484NWjB1G3gE2eNpB-46lz1mNvv_durRn079p1P_Fqe-_8Od-vD1Hdz7IVRgvk72JyUkd-trw0KXKI3tpe9n3K7CUHugvCf79H8Ysr51OawmwlA8Yv0-BYr-ct30uRHwzEdbpdOI9YtPTw-sLxiVDHiiCSYXrI-CZKD_KOLZBfGsI15vhAhUf9zXoO06HEWEziXItVRpQuMkSEO37Stywaob2b-GT_Zzr0VIQZvRngF-V6W6kChw4IVdcYNSkRSSDazeEzyldmZS31OTQqNcdLrvnQjcVWRbf-b2ZxxfVPp-xdHR2NfodThagkbom54Ud--omoWoKOGt9igT59DZN-5-DTMHzV_aU76tXZ86T3Uu835b3f5eg1wYuEPdGd7-ExjLjhj792P1-23covsc1Hu4fwPdOJV4HNtGCCnzK8fB9--_fmBPny-Pf19qZAbX1NzJ0h_jwunPtykbQ0fMcscyo1_hBThSezfFgaeBfvC41DkOhG6gzdLHV40TII52dERaa0ttyhA7tXeflokB-9zUNcX4QK3UgroQRMCvpvNaOHTToqfP4VQKcX27ITSoUp2bUD70RtF_d9P7-_6kQD2WcWjdDdiHHVs20vq-pVcwNelC-4POrZx-2aFestVtU4LvezuTaG3UBpscWohdt-KyEdVZiDa9yW-H-Xw45zdOuVLeUrsw9DFhIkNd0a7P58PkIS3p7kwPJBrpf2vyHfMQZey-c355Src3jSlvf3Pj2LdSUCK8G4_Gk-hmdTuXwywSecLpx4XROMojMZnwLIujcV4Ms_FNyTOwd0v96ySp5TZ0Y3APZ75Rt0mUJBH1oafxbDweRNNonM3iaDScxPF4MgZyZYXMMyA-yIVuzK1nKWuWFi9LZZ09vKTG8rKW0m8H-igIV9rcVq-N0fWN3_nWc_5PZlu9UA">