[PATCH] D157070: [C++20] [Coroutines] Introduce `@llvm.coro.opt.blocker` to block the may-be-incorrect optimization for awaiter

Thu Aug 10 12:53:27 PDT 2023

rjmccall added a comment.

Hmm.  This is quite interesting.  So we've got two things going on that aren't quite kosher from an LLVM perspective:

- The `alloca`s in the coroutine are potentially deallocated whenever there's a suspend.  Normally this isn't a problem because code in the coroutine can't run when that happens, but:
- The call to `await_suspend` is not really part of the coroutine execution, and this is really sneakily important in a lot of ways.

One of those ways that I, at least, hadn't realized before: the local variables of the `await_suspend` call *must not be allocated into the coroutine frame*, because otherwise they can be deallocated before they're allowed to be.  This works out today because inlining adds lifetime markers to the `alloca`s of the inlined function, and coroutine frame lowering detects when an `alloca` has lifetime markers that don't cross a suspend and leaves it as an `alloca` in the split function.  But with `await_suspend` we're semantically reliant on that; as long as we can inline `await_suspend` into an unlowered coroutine, then if we have any gaps in that "optimization" at all, it can cause a miscompile.

The awaiter temporary has to go into the coroutine frame: its formal lifetime lasts until the end of the full-expression in the coroutine that contains the `co_await`, and of course we also have to call `await_resume()` after resuming from suspension.  So necessarily `await_suspend` has to handle the `this` object being destroyed during the execution of the method: e.g. anything it does after enqueuing the coroutine to be asynchronously resumed is unordered (racing) with the destruction of `this`.  If we inline `await_suspend` — which in general we'd really like to do because a lot of its data flow is just local to the coroutine — we have to be careful about how that's optimized, which ends up being the direct bug here.  But we *also* have to be careful about how anything it might store a reference to might be optimized: the awaiter could captured a pointer to some other local variable from the coroutine, and that variable will be also apparently be deallocated during its `alloca` lifetime.

If marking an `alloca` as escaping (as your current patch does) is really sufficient to fix this bug, we should be good.  For example, it implicitly fixes the problem with captured pointers to other `alloca`s just by the nature of transitive escapes: if a pointer to the awaiter can escape, and the awaiter may be storing a pointer to some other variable A, then the pointer to A can also escape.  I'm concerned that marking an `alloca` as escaping might not be sufficient, though.  The optimizer does have to be a lot more conservative about optimizing accesses to an escaped `alloca`, but there are still *some* optimizations it can do because they can't be detected by well-behaved code.  For example, if the optimizer sees a conditional store to an `alloca` followed by a return, it could certainly still make it unconditional (or eliminate it entirely) — the memory is guaranteed to exist, and there's no valid way to observe the store because there are no memory accesses or synchronizations before it goes out of scope.  In the example, what actually blocks the optimization is that the store is followed by an intrinsic call, which the optimizer has to pessimistically assume can observe the store through the escaped pointer.  So I'm worried that just marking the variable as having escaped isn't sufficient to prevent all mis-optimization here, and the analysis of why it probably won't happen seems really brittle.  Maybe it's fine, though.

What does the coroutine IR actually look like here?  We must be setting up the coroutine frame for the suspension before the call to `await_suspend`, or else it won't be properly ordered before whatever dispatch we do there; but the actual suspension point must come *after* the call to `await_suspend`, or the optimizer's understanding of the control flow will be completely messed up.  I'm wondering if we might need to keep the `await_suspend` call outlined somehow until after coroutine splitting.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157070/new/

https://reviews.llvm.org/D157070