[PATCH] D157070: [C++20] [Coroutines] Introduce `@llvm.coro.opt.blocker` to block the may-be-incorrect optimization for awaiter

Thu Aug 10 22:17:59 PDT 2023

rjmccall added a comment.

> Yes and I don't feel it is scary. The semantics here are clear. So if a miscomputation of such cases happens, it implies that there is a bug either in CoroFrame or in the inlining pass. We'll know how to handle them.

Hmm.  The problem is that both CoroFrame and the inliner treat those as best-effort, because they're just trying to provide optimizations, not preserve a critical semantic property.  With that said, I don't have a great alternative than your suggestion of just fixing bugs as they come up.

> I think we need to believe in other optimizations for the cases you mentioned. Otherwise, not only the design of LLVM coroutines is potentially broken, I feel the LLVM itself is not stable too. Since the cases you said are not limited to coroutines. So I feel we had to believe other optimizations and if we meet any other bugs, let's fix them.

I think you're misunderstanding my point.  That optimization would be correct in a non-coroutine.  You're relying on stronger properties here than LLVM IR guarantees.

> This works. But I feel this is more or less a policy issues

Not really, no.  Ideally the performance of generated code would never be in conflict with correctness, but when it is, it becomes a clearly subordinate goal.  So the obligation here is to demonstrate that we can still compile code correctly while making the more aggressive representational choices that are enabling those performance benefits.  If we can't demonstrate that, we need to make less aggressive choices.

The transformation in CoroSplit is incorrect according to normal IR semantics because the execution of the call to `async_suspend` can become asynchronous to the execution of the coroutine, breaking normal assumptions about instruction ordering and local allocation.

> And actually, the @llvm.coro.save will be lowered into an update of the index of the coroutine frame.

That isn't all that coro.save has to do — it also has to make sure that spills are written into the frame.  And in fact there's a subtle thing with that I'm not sure we're getting right, which again comes back to the special behavior of `await_suspend` and the ways our representation choices defy ordinary IR semantics.

Consider an awaiter that schedules a coroutine to be resumed asynchronously.  It is important that stores that occur before this scheduling in program order remain before it after optimization.  Now, optimizations like Mem2Reg lose information about when loads and stores are performed, but that normally doesn't matter because Mem2Reg has proven that the memory isn't visible non-locally and so memory ordering is irrelevant.  But control flow in a coroutine doesn't really follow the CFG when you've got a non-trivial `await_suspend` — the control flow of the coroutine really stops at the `coro.save` and then picks up again after the `coro.suspend`, and the execution of `await_suspend` is in some sense asynchronous after the `coro.save`.  And this is directly relevant because, when Mem2Reg needs to introduce a phi, that phi will generally appear *after* the stores that went into it, which is to say, not necessarily prior to the scheduling that might happen within `await_suspend`.

Let's make that more concrete.  Suppose that our coroutine looks like this:

  void *var = nullptr;
  co_await someFunction(&var);
  free(var);

And suppose that the awaiter returned by `someFunction` contains the pointer `&var`, and its `await_suspend` does something like `*var_p = malloc(16)` before asynchronously scheduling the coroutine handle.  In the unoptimized code, `var` is just part of the coro frame, and this store happens in its usual program order, and as long as that happens-before the async scheduling, it'll happen-before the resumption of the coroutine and so will be visible when we execute `free(var)`.

But consider what happens under optimization.  After inlining and a little bit of optimization, we'll do the `coro.save`, then call `malloc`, store the result to `var`, do the `coro.suspend`, then load of `var`.  Mem2Reg should then directly forward that store to the load, creating a direct value dependency across the `coro.suspend`.  CoroFrame knows how to handle this: it has to spill the value into the coro frame.  But where it should spill?  It cannot spill at the point where the original store was done, because that information is lost.  It cannot spill at the `coro.save`, because the `malloc` call hasn't happened yet.  It cannot wait to spill until the `coro.suspend` to do it, because that would not necessarily be ordered before the async scheduling.  Fortunately, CoroFrame will generally spill values at the point where the value was defined, which in this case happens to be approximately where the original store was, which is great.

However, consider what happens if the store was conditional within `await_suspend`.  Mem2Reg will still do its work, but the value that's live across `coro.suspend` will now be a phi, and that phi will be introduced at the join point in the CFG.  That point is no longer necessarily prior to the async scheduling, and so when CoroFrame spills it, it will insert a store that isn't necessarily ordered before the resumption, and we will have a miscompile.

I am very concerned we are going to be fighting problems like this indefinitely because of the way we are abusing IR.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157070/new/

https://reviews.llvm.org/D157070