[PATCH] D86859: [Coroutine] Make dealing with alloca spills more robust

Tue Sep 8 09:48:05 PDT 2020

hoy added inline comments.

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:634-638
+// TODO: If the pointer is really escaped, we are in big trouble because we
+// will be escaping a pointer to a stack address that would no longer exist
+// soon. However most escape analysis isn't good enough to precisely tell,
+// so we are assuming that if a pointer is escaped that it's written into.
+// TODO: Another potential issue is if we are creating an alias through
----------------
lxfind wrote:
> wenlei wrote:
> > hoy wrote:
> > > hoy wrote:
> > > > lxfind wrote:
> > > > > hoy wrote:
> > > > > > wenlei wrote:
> > > > > > > lxfind wrote:
> > > > > > > > hoy wrote:
> > > > > > > > > lxfind wrote:
> > > > > > > > > > hoy wrote:
> > > > > > > > > > > lxfind wrote:
> > > > > > > > > > > > hoy wrote:
> > > > > > > > > > > > > lxfind wrote:
> > > > > > > > > > > > > > hoy wrote:
> > > > > > > > > > > > > > > This does sound fragile. I'm wondering if this is a defined behavior in the language standard. For a local variable first allocated on the local frame and later on copied into the coroutine frame, all subsequent accesses to it should be redirected to the coroutine frame? If so, can the coroutine frame be allocated ealier? 
> > > > > > > > > > > > > > This is defined behavior in the language. How to manage the data in the heap is implementation details and should not affect program correctness. So it's certainly true that this implementation is not quite right yet. The hard part is that some optimization passes will insert allocas before coro.begin, which makes it difficult to track. A proper implementation should provide strong guarantees.
> > > > > > > > > > > > > > This patch only makes existing algorithm slightly more robust, but does not solve the more fundamental problem, which will take some time to design and implement.
> > > > > > > > > > > > > I see. Is it possible to fix those passes allocating locals before `coro.begin`? If that's too involved, can we move the `coro.begin` to the beginning of a function and redirect all local references to the coro frame? It's basically similar to outline all the function code except for the `coro.begin` call.
> > > > > > > > > > > > The problem is that coro.begin depends on the creation of the frame on the heap, and the creation of the heap sometimes could depend on parameters of the function, which will lead to allocas before coro.begin.
> > > > > > > > > > > > I will need to look into how this is done in the Swift ABI and potentially use that model.
> > > > > > > > > > > How does the frame creation depend on the function local variable values? Does the frame size depend on the values ?
> > > > > > > > > > @hoy My understanding is that you could provide a custom Allocator for the frame creation, which comes from a parameter, which will be moved to a local variable the first thing in the function.
> > > > > > > > > I see. Thanks for the explanation. That sounds like a special use of locals which should have a very short lifetime. For regular local variables, their computation may be able to be deferred until they are allocated on the coroutine frame.
> > > > > > > > I agree. All the cases I have seen so far are Allocas generated for parameters, and there are only simple casts/GEPs happening before coro.begin. So I hope this patch is good for a while, and give us sometime to think about this more systematically.
> > > > > > > > can we move the coro.begin to the beginning of a function and redirect all local references to the coro frame
> > > > > > > 
> > > > > > > This seems promising to me. Relying on pointer not escaping data flow analysis for correctness is a bit scary. 
> > > > > > > 
> > > > > > > We could have something like a canonicalization for the sequence of frame allocation, followed by coro.begin and then user alloca. The idea is still similar to D66230, but without changing the order of the interfering def-use.
> > > > > > Yeah, still trying to figure out what locals could be used before coro.begin. Are the lifetimes used by the coro.begin call all compiler-generated, not user code? If so, the compiler can ensure that all locals are allocated on the coroutine frame before they are used.
> > > > > @wenlei @hoy 
> > > > > The frame allocation (and hence coro.begin) is supposed to be the first thing that's happening in the coroutine function. However, there is one complication.
> > > > > The frame allocation can use a user-defined placement form of `operator new` to allocate the memory, which can take arguments from the arguments list in the function (refer to https://en.cppreference.com/w/cpp/language/coroutines, the Heap Allocation section).
> > > > > Allocas can then be introduced when trying to pass parameters of the function into the frame allocation call. When other optimization passes are involved, sometimes alias of the allocas for the parameters can be created which are used latter after coro.begin.
> > > > > In theory, the frame allocation call can also have side-effects to the parameters, which are already stored in Allocas.
> > > > > So in the current model without any major redesign, I don't think there is a perfect solution. We can only make a best-effort try to copy anything that might have been modified and recreate any pointer that points to the stack.
> > > > > Assuming that all the complications are introduced due to parameter passing, I do think that the pointer analysis is powerful enough to handle all cases that's reasonable. But to make it truly robust, we would need to do something entirely different.
> > > > So sounds like only parameter locations need to be seen by `coro.begin`. We might be able to allow special temporary allocas for the sake of `coro.begin` while allocating real parameter and other locals on the coroutine frame directly.
> > > Talked offline, current `coro.begin` intrinsic call doesn't kill values crossing it, thus it doesn't prevent optimization passes from setting a value before it and using the value after it. This is also not modeled by any IR intrinsic attribute either. I guess we can assign `coro.begin` an existing attribute to prevent any function call being moved across it to minimize the escaped case.
> > > 
> > > The patch looks a decent workaround to me without introducing fundamental changes to the current coroutine implementation model. 
> > > The frame allocation can use a user-defined placement form of operator new to allocate the memory, which can take arguments from the arguments list in the function
> > 
> > Ok, so having some sequence before coro.begin may be required for frame allocation on heap via custom allocator. But that isn't problematic by itself, as long as nothing from frame allocation will cross coro.begin, right? And frame allocation should be a well structured sequence, wondering what would cause something to be used by both frame allocation and after coro.begin?
> > 
> > For this case below you mentioned in comment, assuming neither %a nor %b is used in frame allocation, would moving coro.begin above them (but still below frame allocation sequence) solve the problem? I can see that your fix makes things better, but trying to understand its overlap/tradeoff against the move approach (improving Gor's original fix to avoid breaking def-use chain).
> > 
> > ```
> > %a = AllocaInst ...
> > %b = call @computeAddress(... %a)
> > ```
> So far the only problematic cases I know of are when the parameters of the function are localized though AllocaInst before coro.begin, and then used after coro.begin. This can happen when the custom allocator needs to use those parameters or some other optimization passes inserted AllocaInst in the front for whatever reason.
> I don't actually think the counter-example in my comment (%b = call @computeAddress(... %a)) would really happen today, otherwise this solution won't work. I put down the comment just to note what the current solution cannot cover in case we hit it in the future. If this does happen, it would still most likely be due to the custom allocator (say the allocation call is inlined, and latter some optimization pass makes some of the local variables to be shared across coro.begin boundary)
I'm also wondering maybe we should assert for the cases we've not seen and cannot handle so far, i.e., escaped local addresses before `coro.begin`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86859/new/

https://reviews.llvm.org/D86859