[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

Mon Apr 26 19:13:53 PDT 2021

ychen added a comment.

In D100739#2715964 <https://reviews.llvm.org/D100739#2715964>, @ChuanqiXu wrote:

> In D100739#2713579 <https://reviews.llvm.org/D100739#2713579>, @ychen wrote:
>
>> In D100739#2711698 <https://reviews.llvm.org/D100739#2711698>, @ChuanqiXu wrote:
>>
>>>> This is an alternative to D97915 <https://reviews.llvm.org/D97915> which missed proper deallocation of the over-allocated frame. This patch handles both allocations and deallocations.
>>>
>>> If D97915 <https://reviews.llvm.org/D97915> is not needed, we should abandon it.
>>>
>>> For the example shows in D97915 <https://reviews.llvm.org/D97915>, it says:
>>>
>>>   #include <experimental/coroutine>
>>>   #include <iostream>
>>>   #include <stdexcept>
>>>   #include <thread>
>>>   #include <cassert>
>>>   
>>>   struct task{
>>>     struct alignas(64) promise_type {
>>>       task get_return_object() { return {}; }
>>>       std::experimental::suspend_never initial_suspend() { return {}; }
>>>       std::experimental::suspend_never final_suspend() noexcept { return {}; }
>>>       void return_void() {}
>>>       void unhandled_exception() {}
>>>     };
>>>     using handle = std::experimental::coroutine_handle<promise_type>;
>>>   };
>>>   
>>>   auto switch_to_new_thread() {
>>>     struct awaitable {
>>>       bool await_ready() { return false; }
>>>       void await_suspend(task::handle h) {
>>>         auto i = reinterpret_cast<std::uintptr_t>(&h.promise());
>>>         std::cout << i << std::endl;
>>>         assert(i % 64 == 0);
>>>       }
>>>       void await_resume() {}
>>>     };
>>>     return awaitable{};
>>>   }
>>>   
>>>   task resuming_on_new_thread() {
>>>     co_await switch_to_new_thread();
>>>   }
>>>   
>>>   int main() {
>>>     resuming_on_new_thread();
>>>   }
>>>
>>> The assertion would fail. If this is the root problem, I think we could adjust the align for the promise alloca like:
>>
>> The problem is that any member of the coroutine frame could be overaligned (thus make the frame overaligned) including promise, local variables, spills. The problem is *not* specific to promise.
>>
>>>   %promise = alloca %promise_type, align 8
>>>
>>> into
>>>
>>>   %promise = alloca %promise_type, align 128
>>>
>>> In other words, if this the problem we need to solve, I think we could make it in a simpler way.
>>
>> This may not fix the problem.
>>
>>> Then I looked into the document you give in the summary. The issue#3 says the frontend can't do some work in the process of template instantiation due to the frontend doesn't know about the align and size of the coroutine. But from the implementation, it looks like not the problem this patch wants to solve.
>>
>> I meant to use that as a reference to help describe the problem (but not the solution). The document itself includes both problem statements (issue#3) and solutions (frontend-based) which are totally unrelated to this patch. It looks like it is not that useful in this case so please disregard that.
>>
>>> I am really confused about the problem. Could you please restate your problem more in more detail? For example, would it make the alignment incorrect like the example above? Or does we want the frontend to get alignment information? Then what would be affected? From the title, I can guess the size of frame would get bigger. But how big would it be? Who would control and determine the final size?
>>
>> understood.
>>
>> There are two kinds of alignments: the alignment of a type/object at compile-time (ABI specified or user-specified), and the alignment the object of that type actually gets during runtime. The compiler assumes that the alignment of a struct is the maximal alignment of all its members. However, that assumption may not be true at runtime where the memory allocator may return a memory block that has insufficient alignment which causes some members aligned incorrectly.
>>
>> For C++ coroutine, right now the default memory allocator could only return 16 bytes aligned memory block. When any member of the coroutine frame (promise, local variables, spills etc.) has alignment > 16, the frame becomes overaligned. This could only be fixed dynamically at runtime: by over-allocating memory and then adjust the frame start address so that it aligns correctly.
>>
>> For example, suppose malloc returns 16 bytes aligned address 16, how do we make it 64 bytes aligned? align 16 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-16=48
>>
>> Another similar example, suppose malloc returns 16 bytes aligned address 32, how do we make it 64 bytes aligned? align 32 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-32=32
>>
>> Another similar example, suppose malloc returns 16 bytes aligned address 48, how do we make it 64 bytes aligned? align 48 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-48=16
>>
>> Another similar example, suppose malloc returns 16 bytes aligned address 64, how do we make it 64 bytes aligned? align 64 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-64=0
>>
>> So the mamximal adjustment amount is 64-16=48. We don't know until runtime if the malloc returned address X is (X % 64 == 0) or (X % 64 == 16) or (X % 64 == 32) or (X % 64 == 48), so we must emit extra code to deal with all cases (by bitwise operations).
>
> Thanks for the explanation. I think I got the problem now. And my understanding for the solution of this patch is, if the align of the original frame is 64, then we allocate (64+48) space  to the new frame now. And the original frame becomes an alloca with align 64 in the new frame. So the actually frame gets the right alignment now. Do I get the problem and solution right?
>
> I am wondering if there is a simpler solution. For example, after we construct the frame, we can get the alignment requirement for the frame. Then if the alignment is bigger than 16, we could lower the `coro.begin` to aligned new instead of default new. It looks like  that the implementation would be much simpler.

Oh, right now C++ coroutine standard is written in the way that the aligned new is not searched by the frontend. The limitation will be lifted in C++23 (hopefully).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739