<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/56972>56972</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Clang fails to apply HALO when coroutine is destroyed from await_resume
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          jacobsa
      </td>
    </tr>
</table>

<pre>
    Here is a small example of a lazy-start coroutine promise, with a coroutine `Bar` that calls another coroutine (`Baz`) that can be inlined, where the inlined coroutine calls yet another one `Qux` that cannot.

```c++
#include <coroutine>

struct MyTask{
  struct promise_type {
    MyTask get_return_object() { return {std::coroutine_handle<promise_type>::from_promise(*this)}; }
    std::suspend_always initial_suspend() { return {}; }

    void unhandled_exception();

    void return_void() {} 

    auto await_transform(MyTask task) {
      struct Awaiter {
        bool await_ready() { return false; }
        std::coroutine_handle<promise_type> await_suspend(std::coroutine_handle<promise_type> h) {
          caller.resume_when_done = h;
          return std::coroutine_handle<promise_type>::from_promise(callee);
        }

        void await_resume() {
          std::coroutine_handle<promise_type>::from_promise(callee).destroy();
        }

        promise_type& caller;
        promise_type& callee;
      };

      return Awaiter{*this, task.handle.promise()};
    }
    
    auto final_suspend() noexcept {
      struct Awaiter {
        bool await_ready() noexcept { return false; }
        std::coroutine_handle<promise_type> await_suspend(std::coroutine_handle<promise_type> h) noexcept {
          return to_resume;
        }

        void await_resume() noexcept;

        std::coroutine_handle<promise_type> to_resume;
      };

      return Awaiter{resume_when_done};
    }

    // The coroutine to resume when we're done.
    std::coroutine_handle<promise_type> resume_when_done;
  };


  // A handle for the coroutine that returned this task.
  std::coroutine_handle<promise_type> handle;
};

MyTask __attribute__((noinline)) Qux() { co_return; }

MyTask Baz() { co_await Qux(); }

MyTask __attribute__((noinline)) Bar() { co_await Baz(); }
```

The awaited task's coroutine handle is destroyed immediately upon resumption in `await_resume`, so it should be possible to apply HALO and elide the allocation of the coroutine frame for `Baz` in `Bar`: the frame is both created and destroyed within `Bar`. But when compiled with `-std=c++20 -O2 -fno-exceptions` ([compiler explorer](https://godbolt.org/z/GT5dEY1Mb)), clang fails to do this:

```asm
Bar() [clone .resume]:                         # @Bar() [clone .resume]
        push    r14
        push    rbx
        push    rax
        mov     rbx, rdi
        cmp     byte ptr [rdi + 48], 0
        je      .LBB7_1
        mov     rdi, qword ptr [rbx + 32]
        call    operator delete(void*)@PLT
        mov     rdi, qword ptr [rbx + 16]
        mov     qword ptr [rbx + 40], rdi
        mov     qword ptr [rbx + 24], rdi
        mov     qword ptr [rbx], 0
        add     rsp, 8
        pop     rbx
        pop     r14
        jmp     qword ptr [rdi]                 # TAILCALL
.LBB7_1:
        mov     edi, 56
        call    operator new(unsigned long)@PLT
        mov     r14, rax
        mov     qword ptr [rbx + 32], rax
        lea     rax, [rip + Baz() [clone .resume]]
        mov     qword ptr [r14], rax
        lea     rax, [rip + Baz() [clone .destroy]]
        mov     qword ptr [r14 + 8], rax
        mov     qword ptr [r14 + 16], 0
        mov     byte ptr [r14 + 48], 0
        mov     byte ptr [rbx + 48], 1
        mov     qword ptr [rbx + 16], rbx
        call    Qux()
        mov     qword ptr [r14 + 32], rax
        mov     byte ptr [r14 + 48], 1
        mov     qword ptr [r14 + 16], r14
        mov     rdi, rax
        add     rsp, 8
        pop     rbx
        pop     r14
        jmp     qword ptr [rax]                 # TAILCALL
```

If on the other hand we make the [minimal change](https://gist.githubusercontent.com/jacobsa/7688e09e655744c57da9f0b8a4a02727/raw/361ac408e580d57edb9340544522535b1dfc6faf/gistfile1.txt) so that the callee's handle is destroyed in `~MyTask` ([compiler explorer](https://godbolt.org/z/EK4nMsvdY)) HALO is applied and there are [no allocations](https://gist.githubusercontent.com/jacobsa/447df5606892d7729898795a68e009a1/raw/302344cb38e1194fa68b178324acb4aa242ae112/gistfile1.txt) within `Bar.resume`.

**Can clang be taught to apply HALO when the awaited coroutine's frame is destroyed in the awaiter's `await_resume` method?**

---

If you are interested in why I don't want to destroy the handle in `~MyTask`, it's because it causes much worse code to be generated for the frame destroy function. For example, here's what `Bar.destroy` looks like in my original example where I destroy the handle in `await_resume`:

```asm
jmp     operator delete(void*)@PLT                      # TAILCALL
```

But when the the handle is destroyed in `~MyTask` there is a ton of code generated:

```asm
        push    rbx
        mov     rbx, rdi
        cmp     qword ptr [rdi], 0
        je      .LBB9_5
        cmp     byte ptr [rbx + 96], 0
        je      .LBB9_5
        cmp     qword ptr [rbx + 24], 0
        je      .LBB9_5
        cmp     byte ptr [rbx + 72], 0
        je      .LBB9_5
        mov     rdi, qword ptr [rbx + 56]
        call    qword ptr [rdi + 8]
.LBB9_5:
        mov     rdi, rbx
        pop     rbx
        jmp     operator delete(void*)@PLT                      # TAILCALL
```

I believe clang is forced to do this, since it must do different cleanup depending on where `Bar` is suspended when it is destroyed. But my library doesn't support destroying it anywhere but the final suspend point, so all of this code is unnecessary and we really do just want a tail call to `delete`.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzNWVtz4jgW_jXkRRXK-ILxAw8k6cykNl2zW5WXeaJkWwalbctjyUmYX7_nSPIFYwLZ7praFAXE0rnfPolYpIf176xmhEtCiSxonhP2QYsqZ0Rk8Cinfx9upaK1IomoRaN4yUhVi4JLNnPvyTtXe9jWr82Wzh2t4Z2oPQUi4AisS6H2rB5uc1d659_wPnOjdnNJYtClzGFLqtnvUTmgbR8OWBjWB6Y69sLI_0_zMZBfwuJ85jzMnI19B4n6lczcO3yZp67HyyRvUmDh3XdSZt63Ia1UdZMo8v3wQuWPWWhpCbHPrWO26lABm36ZWAqyY2pbM9XU5VbEryxR6AcwH_YS8xy_SpXOvA28OjW2e1qmOWhzPxSByul9GTzcdlEBjhu15xIYz8KHmXdH8KNTpeMuG1mxMt3S_J0eJHiYK07zrX08qdmYX8_1TfCUNKXRM92yj4RViovSsAGiSQLrC_zeywPuZLybNkoQ-k652qqaljITdQEU1q0Ko2GJO5IuKhskg_QYrRISC5FbpjWj6eHU5Izm4NGxB4-8eDlGVkTv16_Q7ifswj_MflbPayabgm2hTsptqvPfewAa72S_NehnU0uLZcOItgImcqILc-tj1HUQ55GKv0y3ecog9OIwyr0Lmh6JcJfWwyfUk9vYaJupk1MpNgo2I9EHbane6ySeG4Pnw1qOhsyO9Nf_HJdIxsuTEi6FqcZfVB5Ddv9ndXLO0oHrlWjz8CcTuJU1Geev2X1Op6uzaNwFzmbM4JH7CC_yAqO1H6mQQIYVDt6SvIOtIcxfZDmfGCBXmHaiWa_WqXXditFtQwxbAs1eY4CBojjbjRsAE2AFmfLp5_EXMscutKqcqGWHzHZLlap53Ci23eokWJXCwBIsUkgJBB79CEmEHfUTE9OyRPxzRKDzrOdznvKyMojCJnh3Io95t5hoKApzQxOhi_WEDeUgBjY44HrbbmEbLwqWcqpYfiBNJUoTfw0EAF8gODuqJMR-90QKAorJvWjyFPFfJaTkca7TkVYVsPp98_wHoLyUsJynBg9C1xUJ1YwBqB5nR1bTwmRNhzGtdANNIS80hdkHBsSAH0kCTQ5NRTm9RQhwh7RzctcoUx6JKCqe2z2441an3YPFla5Dbv9wyW1WitsODknUBQMQ3FnyGgB3lYsaCjl4gJW9UpXE5NVFsBNpLHI1F_UO_oPQPf72EqTf_lx8j02c0YFJTssd9GAOeBh8lgpdEchjCvZSWZgngwwBbXJEDxZPoCbeoHeO_gArk5nvfE5_PDQbudeda-GfWYg_zizQ0UIh3khLAbbXKT9eT4rKTK-DglRSNSoHm0DpO-KvtI_viXNM88rM5_z57i7cLs4IBElA-te7qNOOcfyhGXvuicmIC_BTVKymCnIxZTlTOD0M2t1g8Hzn388vXxe3WJ6Ia8km9_uOtfvEW5-Suf5XyabdS9PUmCQrXF2NIi2qLqDTC-OkebUhPpYOOgYPZPyHufqyeXq-3zw_Gy5tkL3NtE3MOD5YXohnyd4hmE0p-Q6HEOT_7lJIwRD05rmU_iS1pshyRg1bqisBaXilaQZzZaIyr0udhf9r5LZY_AuCNa_VOfGfUpnKOE3BluioK1iac11hkqatqJbmTLP4pGzRpnGqt8nVT_7rTT6fIFcZfZUBY--elOSoa52o8s-0APpxXQuYhDtPGQEwgbjA3CYhvgEITAr6w0AOkFDwkhc0Jwks7tjkwOZSzXcACJq4kaxORKlYqeYw7GHxlSYilhS-hcvVijkRWwZB6PtJEKY0ypx4RX3quKEbwpaaQn959JYLmvjOigUrJw1ClsaR5zuB7weuG3hBvEizZJnRzIrOAFEs5upDYRFKYYCyRkf2cAwQbhK4aYgzC7_Zq62fRSnf_uWX3-Vb-qdFoxrB4SUj4DluQZbSl3u01p4txQDSyZ_wrO-HaRYsneUqctMwdKNVtAqjgC7B305EF71nHdcD18feii0WkZ_BjngRrjzXp0nsU-r6LoUVd8qzR6hw3qHZ43tGHPGbe1pafAbIVtFmt1cjXKuxpBpg7f7yEaPVgdSjYPX7a73rFFmTgqm9ADz6aBQZqnZ7ezvK_INodCQ4OBVYKCPlfX8gT3j8AxkAemmpdbeKaB3aXDpJH6xyrrRuMUsoRAxBvv4iSdEkewKVKxG1pxrog3d2rMTRCqLbA5-xvZWXNWWC2TEnj6Jur6lRDuaRlvSO2W6D0s4d8EQuxA9Jcv5DK1ociKj5Di9Kurtuc8_8dN608bHlErZu-9Nl9HfSrr7Ss7qzCCo8VPpCbZvS05f-ypygdBy6CFy0r2vQ5_D7tTB9AsNdAOjRNrgC8duRG53BA1cx_BwU_yoVQ_d_YHjNKSE4PSW0OGPs9R5xdRAZRZ6DyO2UPzuuxwv_UDk8QRuB8fLGbMuFBIdWkuD9RX8sxhsHXia6HxWNVLiQ8iyDioD-lgC6LZsKFMRLSA5MRGnbQ__TFvC1t5R4-sf6A17DmjO3BNBpch7XtD6ADCZNF5VNVYlatXtRAsefsA5GSNyYea0vclsp4FhozPauBEOorzy4NFULn01ZsoRJiaIsbKkZbETB5BWN1M0bqp3y3GQBeARMsXGA0XXD1otlEIXhahGtbtK1l0ZeRG8UVzlb3x_fMIynV3_1ctR48H7-6Or0pqnz9Xiq40C3UzzP39qP26oW5geyRy5lwyBuj8EyCt2b_Tpcpm4ceqmTOSyOXIdF1F0t3ShMksBdMO8mp5AHcg25PXNdOKcRzQK-Q4bf8LXruK6zcpZO5EZBMPeSMGZB6kWxH9Isw0RkBZg6Rz0Q09zUa61S3OwkLOaABmS_SKU-ATItDvjTBuZuvbaI5EaLXmvV_wtjhI47">