[PATCH] D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info

Wed Feb 24 21:29:17 PST 2021

rjmccall added a comment.

In D96922#2583920 <https://reviews.llvm.org/D96922#2583920>, @lxfind wrote:

> In D96922#2581141 <https://reviews.llvm.org/D96922#2581141>, @rjmccall wrote:
>
>> Is there any way we can simplify this problem for ourselves by expecting IR to be emitted differently within coroutines?  Because I think the true fundamental problem here is that LLVM's representation of stack allocations is not very good, and we're stuck trying to do an analysis that the representation isn't suited for at all.  There's no real chance that LLVM in general is going to move away from `alloca`, but if we can recognize when we're emitting one of these allocations that needs to overlap `coro.begin` or `coro.end` and just do it with a special intrinsic that we treat differently in lowering, that might substantially reduce the difficulty of the problem.  If necessary, we can even change the rules for inlining into unsplit coroutines.
>
> A special alloca for coroutines will certainly help but it's not sufficient. A special alloca and a set of special rules around data allocation in coroutines can help resolve the escape-before-coro.begin issue (though I haven't seen any new bugs around that).

Well, even if we haven't seen any new bugs, a more robust solution would be welcome; it's hard to feel confident in what's happening there.

> There are other problems that need to be solved. I need some more time to think about how to put them all together.
> I think we are doing OK identifying variables that need to be on the heap (a conservative approach will do just that).
> But it gets complex when we also need to be able to identify a set of variables that MUST be put on the stack. These are the variables that are being used after a coroutine is suspended (i.e. coro.suspend switch goes to the default destination), at which point the coroutine frame may have been destroyed.

Yeah, I agree with this basic statement of the problem.  If we can figure out a way to reliably recognize the allocations that have to be on the stack but are really just meant to be local to various special computations which aren't fully part of the coroutine, that would make this much easier.

I'm not sure I understand what this allocation is that's only being used after the coroutine frame is destroyed.  Where is it coming from?  If it's actually just a local allocation in a function that we're inlining into the epilogue (a destructor?), can we reasonably delay that inlining to only happen after splitting?  I think in any case we'll need to limit early inlining in those sections if we want to have stronger structural rules about allocation for them; either that or we'll need to be able to introduce our stronger rules during inlining.

> In the example I had above, the coroutine return object is allocated and returned in the very end. Since it's the return object, it has to be on the stack. But the compiler isn't powerful enough to tell that for two reasons:
>
> 1. It appears that it lives through coro.suspend, but it really only is used at the suspend path, not any other paths.
> 2. It's constructed by calling a (sret) struct pointer, which suggests that it may capture, further complicating the problem.

I don't quite understand what you're saying here.  If we're returning the return object indirectly (this is the return object from the ramp function, i.e. the coroutine value itself, right?), we should be constructing it in-place, and there's no local allocation for it.  Are we constructing it into a temporary and then copying that (with memcpy? or a copy constructor?) for the actual return?

> I think a proper solution might require introducing a new IR instruction for coroutine suspension, instead of relying on intrinsics. I have similar conclusions in another bug that involves LICM and coroutine.

Hmm.  I can certainly believe that some of the code-motion passes need to understand that they can't always move things around suspensions.  One problem with actually changing to an instruction, though, is that coroutine suspension actually looks quite different in the different lowerings.

> To make this patch fully correct, we need to first check PI.isEscaped() before relying on lifetime information.

Yeah, this is the main thing I was trying to point out.

> Then for the return object case to work, we have two short-term solutions:
>
> 1. We can treat sret as nocapture. This isn't 100% accurate. But the only case where C++ can capture a sret pointer is the constructor captures "this". But even that happens, in order for the "this" to really escape the object needs to be used again. So I think we should be safe there.

Once it's constructed and has possibly escaped, basically any side-effectful thing might indirectly access it.  Usually functions don't have a problem with captures of this for the return value because constructing the return value is the last thing that happens in the function, except for a few destructors.  That might be legitimately different in C++-style coroutines.

> 2. We can special handle the return object variable, and make sure it's always on the stack. This should also work, I think.

Yes, if Clang is emitting this return object variable as part of emitting the return from the ramp function, it should be easy to force it to use a different pattern, or at least mark it with some metadata.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96922/new/

https://reviews.llvm.org/D96922