<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/57638>57638</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [Clang] stack frame is way too large in coroutine at low optimization levels
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          jacobsa
      </td>
    </tr>
</table>

<pre>
    Our internal build system just shipped opaque pointers, by removing `-Xclang=-no-opaque-pointers` from our build arguments. When this happened I noticed a regression: **at the default optimization level, clang makes coroutine resume function stacks much larger than necessary**.

Here is a simple program with a function `ArrayOnCoroutineFrame` that creates a large local array that must go on the coroutine frame because it may need to survive a suspension:

```c++
#include <array>
#include <coroutine>
#include <optional>

struct MyTask{
  struct promise_type {
    MyTask get_return_object() { return {std::coroutine_handle<promise_type>::from_promise(*this)}; }
    std::suspend_always initial_suspend() { return {}; }

    void unhandled_exception();
    void return_void() {} 

    auto await_transform(MyTask task) {
      struct Awaiter {
        bool await_ready() { return false; }
        std::coroutine_handle<promise_type> await_suspend(std::coroutine_handle<promise_type> h) {
          caller.resume_when_done = h;
          return std::coroutine_handle<promise_type>::from_promise(callee);
        }

        void await_resume() {
          std::coroutine_handle<promise_type>::from_promise(callee).destroy();
        }

        promise_type& caller;
        promise_type& callee;
      };

      return Awaiter{*this, task.handle.promise()};
    }
    
    auto final_suspend() noexcept {
      struct Awaiter {
        bool await_ready() noexcept { return false; }
        std::coroutine_handle<promise_type> await_suspend(std::coroutine_handle<promise_type> h) noexcept {
          return to_resume;
        }

        void await_resume() noexcept;

        std::coroutine_handle<promise_type> to_resume;
      };

      return Awaiter{resume_when_done};
    }

    // The coroutine to resume when we're done.
    std::coroutine_handle<promise_type> resume_when_done;
  };


  // A handle for the coroutine that returned this task.
  std::coroutine_handle<promise_type> handle;
};

MyTask DoSomething();

MyTask ArrayOnCoroutineFrame() {
  std::array<std::optional<int>, 10'000> vals;
  for (auto& val : vals) {
    (void)val;
    co_await DoSomething();
  }
}
```

When [compiled with](https://godbolt.org/z/9819jWE9h) `-std=c++20 -Xclang=-no-opaque-pointers`, clang correctly observes that `ArrayOnCoroutineFrame.resume` needs only a small stack size, since the array is on the coroutine frame:

```asm
ArrayOnCoroutineFrame() [clone .resume]:      # @ArrayOnCoroutineFrame() [clone .resume]
        push    rbp
        mov     rbp, rsp
        sub     rsp, 304
        mov     qword ptr [rbp - 168], rdi      # 8-byte Spill
        mov     qword ptr [rbp - 8], rdi
[...]
```

But when you [compile with](https://godbolt.org/z/756a3d43f) just `-std=c++20` it fails to do this, giving it a huge stack frame despite the fact that it does seem to build the array on the coroutine frame:

```asm
ArrayOnCoroutineFrame() [clone .resume]:      # @ArrayOnCoroutineFrame() [clone .resume]
        push    rbp
        mov     rbp, rsp
        sub     rsp, 80368
        mov     qword ptr [rbp - 80240], rdi    # 8-byte Spill
        mov     qword ptr [rbp - 8], rdi
        mov     rax, rdi
        add     rax, 80081
        mov     qword ptr [rbp - 80232], rax    # 8-byte Spill
        mov     rax, rdi
        add     rax, 80
        mov     qword ptr [rbp - 80224], rax    # 8-byte Spill
[...]
        mov     rdi, qword ptr [rbp - 80224]    # 8-byte Reload
        call    std::array<std::optional<int>, 10000ul>::array() [base object constructor]
[...]
```

As far as I can tell clang has reserved approximately enough space for _two_ copies of the array on the stack?

This is not a bug so much as a missed optimization. I don't know if it's reasonable to expect this optimization to be done at the default optimization level, but it was done before opaque pointers shipped so it's sort of a regression. Is it easy to make it work again?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzlWMFy4jgQ_Rpz6Qpl7EDgwIGQSe0ctqZqZ6p2b5RsC1BiW4wkQ5iv3yfJGGMgQ2rmsFVLnMRYUvfr1lPrWYnM9tMvlSJRGq5KllNSiTwjvdeGF_RSaUN6LTYbnpHcsO8Vp410fXUQzSnZk-KF3IpyRcEovPsnzVm5CuKnu1Le-f53Tf9RSEslC5Jw570wtaoKXhrdp7_XvCSzFprWDN5K-PtMpTQixR2Dl5XiWgtZBvGMgmiGixkM4JTxJatyA3hGFOIHM-hEOd_y3CJ0gKhgr1xTKpWsjCg5zGk4pmVVpq67Nix91VRU6ZpyoOIKpllJJU_hlam999gPwqcgnPm_f3DFCXgZaVFsciRGyZViBe2EWeNpYxyBz5Ri-y_l_ADgGf24TQi8GEoVZ4ZbS8435TLFRDA7xnco7DSsJMnSRXyMY2ntUMJTVmmAQU8MKTlSZiTpSm3FlluAlUZK6-y1YwAEf6VB9Ggv_zSKRZnmVcYpiOcORxB_utTWILnSbudEglXHZvdXG1Wlhv7cf2P6NXio3RLVz5HIQmi-MPsNrBybqR5BK24WiptKlQuZvPDUBNE4iCa2L_nn9labzIYbzxqUC8xplgPsvO3CgnP9LDsXdYuzOLOEhOHg4SmIH8n-a6A01n1yswXLd2yvsZCEESxf1I8vIuvaO1rdSpFRVXqc2YK_pdyl0JvBoE7XOgv2_ugJdqlrmFVgBNsxYRZGsVIvpSowok6osfNQD26GNPMxs8OwJDqtRImUeW0UFM7258EuWY5cdnN3kr-fz07t4pjRj4xdX4jLfrDEcq76vhIsdig_i0yWlrVPGBOf9a8D-lVSObe8O5f2c4ENzTQfcmyxtua5A_G3YetnHFMv9xdY9w7SExfRqM7w2eiL3Xinm18h517qWagZaXNwWKRzR-K-D7jfXsWTtrET_O7L6RJZivJs8ZbSr8PftDza5v5j6-RapK3UG3ng4S8S-ODr4jx_LO5rmG5mUbcKXGVM61H0jIu-nezHIFAtLawp2iHWB6gEa7J_Yeu4IbQzZEdY59E1LR7bjLxZQrHvCAcnK3warFiwusstn-NO_AHm1A0HKGew6k3mSX6VBYcvKMTTwnLS7bJWOit6DcJan8ybB0fRMYfytMUOxWEQYirCMLSAt1hsrTza7MC-Xf62HqGVrMh0vbqFFv38XjvZWgetllQuHMnfCbPNo-bmoMDamXBiOBg-prLYCOgAJymD4RPsrY3ZaBunm-KVzBKZm75U8PX8A7-T8WDy8venid_2IMldWp5qfReF9DONfhTNmHoFbZXvSSaaqy0UqqPNNT1b76VW1loJqiFXMRbqs0CF9xIbUvkHtx40RCJ3nPQyV-gr4vaaYGW68E_epQtSmNtN_QANKYwPtSGKKbi_osyvDz_dyiq9dvUk2Zw24JWIDg2IVulOu64S365dexzeXx7_fSdVRhujLBYYozsajMaOCbCaiWMo47tkbzh9BV_ym221LNXpHT72-_0mzovkfKyMr257WbVI-gGOPgxHLM7u46VNsnvFvEBUSyOspiUTubZ1NZN02OhXwr1wopXRusILk-eWfxmCdNmgqjsuLVlqPGfRN5MgsOZ4q4U1__555N__mHzjMAalbqZMGN2HpwT8ffQ7i4C9XWxnWdZuH4fhePCRCOLo4Jm93R7BzWg-AiW6vwVKZ12eIQMkmHjHQ8f0XzyXLDs1ZnW4o8dHN1Z8qrx5p_CjGg4nTHPyL-lYXaWXzFIdS8wNFWemsZIVMU2fgRIrlQOp36PWeIj1YXcniMwNpMmbKBg67ImXslqtSW9Y6hXQwuzkAiA2AoVALs9XvysjQfzc9v3NSiNcpbTVJqlWpKU_J2L2wAZCSLujsePZUx8oM_vG_mDotZQ7EkuUH3y1SJlGApPcKUX-tuGuPtnNr314ZeuTF4100xlXUrkCtwMiNyjhCJd3z-uaczwEUOPRUhmbifb5GuBraw1Q9xaJPTpz1qV6JbZiokSGenw6GI2i0XgUT-572TTOJvGE9YwwOZ9iSudOYoB27cqMMHf2REvK-pxLlK2Ci0hzJOs8Qt2rVD7t7CnYa6qkj60HX_J8e_h3BwL486BnTEzFsV08Dx9G8bi3no6G8WgQZtkwSng2miyTdBSGUTh4WE6i8Wgy7OUsgTcLP4iikmPirAncI5CemEZYSuEEP4PBZBj3H8bDKONRmgyzLFrGDOWcF9is-haH3ex6auoggTMajbnQRh8bGZK9gkxy7mAf6nMt1fSFpRBbrOdcTx30fwHIxI2C">