[PATCH] D101980: [RFC] [Coroutines] Put the contents of byval argument to the frame

Tue May 11 05:07:23 PDT 2021

ChuanqiXu added a comment.

In D101980#2749774 <https://reviews.llvm.org/D101980#2749774>, @rjmccall wrote:

> In D101980#2743798 <https://reviews.llvm.org/D101980#2743798>, @ChuanqiXu wrote:
>
>> In D101980#2743665 <https://reviews.llvm.org/D101980#2743665>, @rjmccall wrote:
>>
>>> How is the situation with `byval` different from the situation with indirect arguments?  It sounds like we generally need to perform this extra copy into the coroutine frame once it's allocated, and yes, that's something we need to emit in the frontend because it might be non-trivial.
>>>
>>> Or is the problem that LLVM might be helpfully eliminating the copy when the source is a `byval` parameter?  But that's also something it could be doing with indirect arguments, so again, treating `byval` specially seems suspect.
>>
>> Sorry. May I ask what does indirect argument mean? I tried to copy all byval parameters since it looks like if the frontend needs to pass parameters by value for structs, it would emit `byval` attributes.
>
> An indirect argument is one that's implicitly passed by reference.  For example, most x86-64 ABIs will pass a struct like `struct { char x[1024]; };` as a pointer to a temporary instead of on the stack.  Normally, functions that take such arguments just bind the parameter variable directly to the pointer that's passed in; if that's a problem for coroutines, you're going to need to force a copy there as well.
>
> Indirect arguments are also used (on pretty much every target except Windows) for non-trivially-copyable class types.  That gets tricky because unfortunately you can't just emit a correct move-construction during coroutine lowering; we'll have to do it in the frontend, if we aren't already.
>
> That brings up a related problem: there are LLVM optimizations that assume things about parameters during a function.  For example, the pointers passed as indirect arguments are generally assumed not to alias anything, which means that we can generally eliminate redundant things like, say, copies out of them that are perfectly happy to just use the original memory.  That's not a problem for copies emitted during coroutine lowering, since we'll immediately split the function and make the coroutine non-redundant from LLVM's perspective, but I am worried that we might trigger these optimizations on copies we emit earlier, and that could mess things up.
>
> For example, consider code like this:
>
>   struct A {
>     char buffer[32];
>     A(const A &);
>   };
>   
>   A::A(const A &other) {
>     memcpy(buffer, other.buffer, sizeof(buffer));
>   }
>   
>   my_coroutine foo(A a) {
>     ...
>   }
>
> If we emit the copy of `a` in the frontend — which I think we have no choice but to do — it'll generate IR something like this:
>
>   define ... @foo(%struct.A* %0) {
>     %a = alloca %struct.A
>     call void @llvm.memcpy(%a, %0, ...)
>     ...
>   }
>
> If the LLVM optimizer sees this, it very well might just remove the memcpy and use %0 everywhere instead of %a, and then we'll miscompile the coroutine.  I'm not sure of the best way to fix this; maybe we just have to disable those optimizations on unlowered coroutines.
>
> I'm asking these questions, not to insist that we fix them all at once, but to make sure that we're setting ourselves up to fix them all and not just doing something short-term that ends up needing to be completely replaced with a more comprehensive fix.  The central design questions to me are (1) do we do this during coro splitting or in the frontend and (2) to the extent that we don't do it solely during coro splitting, how do we protect against miscompilation?

Thanks for the detailed explanation! I still had some questions.

> An indirect argument is one that's implicitly passed by reference

Do you mean `by value` here? If an argument is passed by reference, it makes no sense the coroutine need to copy the argument by value. Since the reference is pointer really. And I think it is responsibility for the programmers to maintain the lifetime for the pointer. And the programmers should agree on there is only pointer in the coroutine frame.

> For example, most x86-64 ABIs will pass a struct like struct { char x[1024]; }; as a pointer to a temporary instead of on the stack. Normally, functions that take such arguments just bind the parameter variable directly to the pointer that's passed in; if that's a problem for coroutines, you're going to need to force a copy there as well.

>From the statement, it looks like it is talking about passing argument by value. So I think it means there are some arguments who are passed by value in the language side but didn't be marked with `byval`. Did I get this? If it is true, I agree with that I need to find another method to handle it.

> Indirect arguments are also used (on pretty much every target except Windows) for non-trivially-copyable class types.  That gets tricky because unfortunately you can't just emit a correct move-construction during coroutine lowering; we'll have to do it in the frontend, if we aren't already.

For the case of move-construction, I think the current patch could handle it if the argument is marked with byval. For example:

  coro_type foo(A); // A is move-constructible.
  void bar() {
      A  a;
     // ... some operation
     foo(std::move(A));
  }

And the corresponding IR should look like:

  void bar() {
      %a = alloca A
     ; ... some operation
     %tmp = alloca A
     ; move-construct %tmp
     foo(%tmp)
  }
  coro_type foo(A* [byval] %a) {
     ; if a is marked with byval
     %tmp.a.byval = alloca A
     memcpy(%tmp.a.byval, %a)
  }

It looks OK to me except the extra copy. The move-construction is already handled by the frontend.

> That brings up a related problem: there are LLVM optimizations that assume things about parameters during a function.  For example, the pointers passed as indirect arguments are generally assumed not to alias anything, which means that we can generally eliminate redundant things like, say, copies out of them that are perfectly happy to just use the original memory.  That's not a problem for copies emitted during coroutine lowering, since we'll immediately split the function and make the coroutine non-redundant from LLVM's perspective, but I am worried that we might trigger these optimizations on copies we emit earlier, and that could mess things up.

I am not sure if I get your point right here. From my understanding, this paragraph and the following example tells about the problem we may met if we try to emit the copy for the argument passed by value in the callee site if the callee is a coroutine. Do I understand right? If I got it, I think the problems told here is a little bit hard to handle. I can't find solutions right now.

> I'm asking these questions, not to insist that we fix them all at once, but to make sure that we're setting ourselves up to fix them all and not just doing something short-term that ends up needing to be completely replaced with a more comprehensive fix. The central design questions to me are (1) do we do this during coro splitting or in the frontend and (2) to the extent that we don't do it solely during coro splitting, how do we protect against miscompilation?

Thanks for the good questions! It really helps. And I **can't answer it directly** right now. I'm just sharing my thoughts:
(1) If there are arguments who are passed by value without being marked by `byval`, `inalloca` or anything else, we should mark them in the frontend.
(2) After that, we could copy the arguments passed by value in coroutine. The downside is the extra copy.
(3) I think we could a peephole optimization to emit some of the extra copy in the caller side in the future.

And I think the Swift and MLIR may have the same problem. Since LLVM always emits the copy in the caller side.

And my thoughts about emiting the copy in the callee side in the front are:
(1) I can't find solutions for the questions you asked. And I don't think we can get a clear solution in the near future.
(2) It violates the principle of LLVM. Although I don't know the reason why LLVM choose to copy the argument in the caller side, I believe there must be some fundamental reasons to do so.
(3) It is much more complex to make it in the frontend at least in clang, although I know it isn't a strong argument that the solution is hard : ).
(4) If we decided to do it in the frontend, then we also need to implement it in the swift frontend and MLIR. I know it isn't my responsibility to implement the swift part and MLIR part. But it really looks redundant. From perspective of LLVM, I believe the reason why LLVM decide to implement the coroutine in the middle end is that LLVM want different languages to re-use the same code as much as possible.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101980/new/

https://reviews.llvm.org/D101980