<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/132068>132068</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            CUDA/HIP: lambda capture of constexpr variable inconsistent between host and device
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          mkuron
      </td>
    </tr>
</table>

<pre>
    Consider the following bit of HIP code:

```c++
#include <algorithm>

using std::max;

template<typename F>
static void __global__
kernel(F f)
{
 f(1);
}

void test(float const * fl, float const * A, float * Vf)
{
 float constexpr small(1.0e-25);

  auto f = [=] __device__ __host__ (unsigned int n) {
    float const value = max(small, fl[0]);
    Vf[0] = value * A[0];
  };
  static_assert(sizeof(f) == sizeof(fl) + sizeof(A) + sizeof(Vf));
  kernel<<<1,1>>>(f);
}
```
The `static_assert` fails in the host-side compilation but succeeds in the device-side compilation. This means that the layout of the struct synthesized from the lambda is inconsistent between host and device, so if you use any of the captured variables on the device side, they will contain the data of some of the other variables. You can also use `-Xclang -fdump-record-layouts` to see that. Evidently the `constexpr` variable is part of the captured variables only on the host side, but not on the device side.
With `--cuda-host-only`:
```
*** Dumping AST Record Layout
         0 | class (lambda at <source>:23:12)
         0 | const float * 
         8 |   const float 
        16 |   float * 
 24 |   const float * 
           | [sizeof=32, dsize=32, align=8,
 |  nvsize=32, nvalign=8]
```
With `--cuda-device-only`:
```
*** Dumping AST Record Layout
         0 | class (lambda at <source>:23:12)
         0 |   const float * 
         8 |   float * 
        16 |   const float * 
 | [sizeof=24, dsize=24, align=8,
           |  nvsize=24, nvalign=8]
```
Godbolt: https://cuda.godbolt.org/z/KE789sevs.

When you compile the exact same code for CUDA, this does not happen. However, if you add
```c++
template <typename T, std::enable_if_t<std::is_arithmetic<T>::value, int> = 0>
__host__ T max(const T a, const T b) {
    return std::max(a, b);
}
```
after line 3 of the code at the top, you get the exact same layout discrepancy as with HIP. See https://cuda.godbolt.org/z/e3Ybr4hK1.

I can replace the `[=]` with `[fl, A, Vf]` and if that `__host__ T max` overload is present, it tells me that _variable 'small' cannot be implicitly captured in a lambda with no capture-default specified_, but if I leave out that overload it does not show that error message.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzMVs1u4zgSfhr6UrAhk5atHHxQ7PZ2MHto7GRndk4CJZUs7lCkQFJOe55-UZQUtz1B0McNBMQk6_erX-m9OhvEPUufWXpcyCG01u27PwdnzaK09XV_sMarGh2EFqGxWts3Zc5QqgC2ga8v36CyNTKRsyR-22T8Ksaf6UtyxoUylR5qBCYOUp-tU6HtmPgysgyeBPpQkxCRd_I7E8_jU8Cu1zIgE4dw7dHIDuE0Mvogg6rgYlUNRXHWtpS6KFiS_4nOoGY8O0HD-BMJ2pE4OmVruhml746jjighoA-MZ422MkBljQ_AeA6NZvwAj7f57ZKOvz2ouVHj996B76Qmc9arBJc8vRmQ5AByCBYaYOIIFAJxZOkRiqLGi6qwKKAoWutDUQDj2WBisGpQJoBh_AkmjQB3Jl6kHjCKJCh5NhlAJrP0OWHp8d0GYv2tmW4jy8QcvZyIJ0oCbPo5Yl9I79ERbF79hZbgbaJV4kiSbpc63vLn21X-t5sRxJtdUxTFYfzWjB_WFPjxGzXdBXLOO5bkry0C2yb3Vm4TaKTSHpSJqUy4LimzobJdr7QMyhoohwB-qCrE-p1yDMbfaFfw2ioPHUrjIbQyRGItr3aIpUEnH9xQBfBXE1okX2tonO0myq6sJSjSQ4FTPqAJUGJ4QzTRPpCmntRT_LwF1cDVDjB4BGmus5pK9mFwWMNFOiVLjR7sj7YD2U4SQotXeFNaU6oEOTsogyRR3nY4i7ShRXeTt4I_7ACVNCC1t1E_2ybL_1RamjMsm3ro-qXDyrp6OULgCfFgwSNGdFbw5aJqNEFfowJqEXONEOmsigDppQuf-qavs4MRptk9ip6x4QPnVyzJf1ehjVYvq6GWy5gAJIqSRuQPOcT49MFx6HpqUPmvr_Cv6CH8M3o41U_8S4DtDlBp6T2V6hRbahDi4O3gKoyZm3PBRL7mY8N45I7Ve-srP1JkkQLuaW7v6-30fs_NNx-xPYiGSMPS56kWxVFwArOm8_tJanU2TBwzxg_EHeWayx2Nudyo0seifEB_Kqr_J_w_hyn7EOFH_D-S8Agv3_wI73h6hPc-ODegR-pPgf6HrUurAxM5tCH0nrDlJ8ZPhPvqPL6urDszfvqL8dMvX3bZk8eLX41D6fcWTWwyY6_DWEr4XVIfo_lL4x4a6-Dw72M-NhXlobboY-21su_RrOCrfcMLOiKYmpas648XhHnKw49j_jV2vHkpQEOlX6imCBTR-Vr5QsZlAoOqmDi8jlFmIo9zLCo3gYkvcbYl4-bwPlNfpwE5Bu0VJNHPh_J-wDoMgzP3WwrPIkf5ySiSTUAHWhkE8d7RCMBpXATbkwiC54zhEeppmNTKVw57aaorSA9vVElfX76t4FfEn4sxij9Kt2l_WU8xfomt3GGvZYVzO35fQKgdv03lytLncQmKoaZdIT7TYFLNOPbYNnmAdJuAvaDTVtaxnTv0aEKMRoCAWtPYHJmL977P-G7aVHZkHeVSiaC6XqtK0dR4nwTKgJzHZ7TT2PlxWWMjBx3A91ipRmFdzINBNfACGuUFgUCN2m9WhlsG-9a-jc_onHXQoffyjKtFvRf1k3iSC9yvdxu-SdIdzxbtPpMJNukGn8QWG77dNmlWY1Y-cbFNRYbpQu15wtNErJ_WO5Hy3SrJ1ulmuy53leDpmq_ZJsFOKr3S-tJR3BbK-wH3a8GTbbbQskTt5yXd7YlqWQ5nzzaJVj74G19QQeN-LM3T15dv1AYmqCaIKA1v2-lt7P7EErIYnN7fJ9xZhXYoV5XtGD-RFdO_Ze_sf7EKjJ-iK57x0-TNZc__FwAA__9W8sxQ">