<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [coroutines] coroutine frame layout doesn't reuse storage for local variables with non-overlapping lifetimes"
   href="https://bugs.llvm.org/show_bug.cgi?id=45566">45566</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[coroutines] coroutine frame layout doesn't reuse storage for local variables with non-overlapping lifetimes
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>clang
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>C++2a
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedclangbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>lewissbaker@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>blitzrakete@gmail.com, erik.pilkington@gmail.com, llvm-bugs@lists.llvm.org, richard-llvm@metafoo.co.uk
          </td>
        </tr></table>
      <p>
        <div>
        <pre>The calculation of the coroutine-frame layout does not seem to take advantage
of the fact that lifetimes of some local variables do not overlap and so can
share storage. This leads to larger coroutine frame sizes than is necessary.

For example, see this compiler-explorer example: <a href="https://godbolt.org/z/JjRnoQ">https://godbolt.org/z/JjRnoQ</a>

This shows the following code (as of clang 10/trunk with -O2 or -O3)

------
#include <cppcoro/task.hpp>

struct big_structure {
    big_structure();
    char dummy[500];
};

void process(big_structure&);
void process2(big_structure&);

cppcoro::task<void> something();

cppcoro::task<void> simple() {
    // 552 byte coroutine frame
    big_structure a;
    process(a);
    co_await something();
}

cppcoro::task<void> alternative_paths(bool cond) {
    // 1072 byte coroutine frame
    if (cond) {
        big_structure a;
        process(a);
        co_await something();
    } else {
        big_structure b;
        process2(b);
        co_await something();
    }
}

cppcoro::task<void> consecutive_paths() {
    // 1072 byte coroutine frame
    {
        big_structure a;
        process(a);
        co_await something();
    }
    {
        big_structure b;
        process2(b);
        co_await something();
    }
}

void normal_function(bool cond) {
    // 512 byte stack frame
    if (cond) {
        big_structure a;
        process(a);
    } else {
        big_structure b;
        process2(b);
    }
}
-----

The coroutines alternative_paths() and consecutive_paths() both define two
big_structure local variables (500 bytes each).
The lifetime of these variables do not overlap and so the compiler should be
able to reuse the same storage for both
of those variables. However, what we observe with the example above is that the
coroutine frames allocate separate
storage for these two variables, resulting in a coroutine frame size of ~1000
bytes instead of one of ~500 bytes. 

Ideally, the same algorithms that optimise the layout and size of stack-frames
would be applied to the coroutine
frame layout calculation to allow it to reuse storage within the coroutine
frame for local variables whose lifetimes
do not overlap.

We can see this behaviour with the normal_function() example, which, despite
having two local variables of size 500 bytes,
only allocates a stack-frame of 512 bytes, as the lifetimes of these local
variables do not overlap.

Implementing this optimisation would reduce the size of heap-allocations
required for coroutine
frames for many non-trivial coroutine functions.

The current logic for calculating the coroutine frame layout is fairly naive.
See
<a href="https://github.com/llvm/llvm-project/blob/e13a8a1fc56837e2f21b85b89a445fb4f21500d6/llvm/lib/Transforms/Coroutines/CoroFrame.cpp#L574">https://github.com/llvm/llvm-project/blob/e13a8a1fc56837e2f21b85b89a445fb4f21500d6/llvm/lib/Transforms/Coroutines/CoroFrame.cpp#L574</a>

It just iteratively adds each spilled alloca as a field in the coroutine-frrame
struct. It does not take into account the lifetime of the variable for each
alloca.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>