[llvm] [Coroutines] fix coroutines + std::unique_ptr with async exceptions validation errors (PR #149691)
via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 29 14:36:28 PST 2025
tzuralon wrote:
> > > > If you find it that critical, I can make the 6 tests have proper scope ends, it should not change the logics I try to mimic, that fails before the patch and passes afterwards.
> > >
> > >
> > > If not hard, I'd like to see it.
> > > To better understanding the problem, (I know you've wrote a lot of things, but it maybe too time consuming or hard to understand all of them, especially I don't use windows), would you like to give some observations for **what part of coro split** makes the IR illegal with an example? Then we can see if we can solve the problem more properly or the current solution is the best.
> > > And it will be helpful to give some explanation to the problem abstractly too. e.g., what is the pattern? And when we makes it illegal?
> > > Sorry for failing o understand your problem. Your description is a bit long and hard for me to understand.
> >
> >
> > Sure, the problem is in CoroFrame.cpp/insertSpills. the main for loop in that function iterates over Spills. When it finds one (in the tests example, the function argument - %0 - a ptr which should be deleted (stored with 0)) at the end of the function in the test. The deletion is done in practice by the instruction: `store i32 0, ptr %0, align 4` After coro-split pass, this instruction turns to:
> > ```
> > 1: %.reload.addr = getelementptr inbounds %"?resuming_on_new_thread@@YA?AUtask@@Vunique_ptr@@@Z.Frame", ptr %5, i32 0, i32 2
> > 2: %.reload = load ptr, ptr %.reload.addr, align 8
> > 3: store i32 0, ptr %.reload, align 4
> > ```
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Where 1 and 2 are generated by (CoroFrame.cpp/insertSpills):
> > ```
> > CurrentReload = Builder.CreateAlignedLoad(
> > FrameTy->getElementType(FrameData.getFieldIndex(E.first)), GEP,
> > SpillAlignment, E.first->getName() + Twine(".reload"));
> > ```
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > and 3 is generated by: `U->replaceUsesOfWith(Def, CurrentReload);` As you can see, the instructions post-coro-split are dependent on the coro-frame (in line 1), which is allocated only during coro-begin, but as the coro-begin is inside a seh scope, it might get unwound before allocation, therefore the deletion code is not dominating coro-begin anymore (due to unwind path in function entry), and the function is broken in practice. Example in the input test code (before pass):
> > ```
> > define i8 @"?resuming_on_new_thread@@YA?AUtask@@Vunique_ptr@@@Z"(ptr %0) #0 personality ptr null {
> > invoke void @llvm.seh.scope.begin()
> > to label %2 unwind label %14
> > 2: ; preds = %1
> > ; Coro-Begin here...
> > ...
> > ...
> > 14: ; preds = %12, %1
> > %15 = cleanuppad within none []
> > store i32 0, ptr %0, align 4
> > cleanupret from %15 unwind to caller
> > ```
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > That shows that the bug is indeed in coroutines, as the input code is semantically legal (node 14 dominates entry point due to using %0) and illegal after pass (transformed node 14 does not dominate transformed node 2 due to depending on coro-frame)
> > I can post here the code post-coro-split (and before my fix), showing it is indeed broken. the code is hard-to-acquire in +assertions because it crashes before success, maybe with -assertions it's more straight-forward.
>
> If I read correctly, the problem is, `%llvm.coro.begin()` doesn't dominate `%0` but the use of `%0` across the suspend points.
>
> Then my question is, (1) Do we have to emit `llvm.seh.scope.begin()` in the front of `%llvm.coro.begin`? Can we promote `%llvm.coro.begin` to make it dominate `llvm.seh.scope.begin()`? This may be done in the frontend. (2) Does `llvm.seh.scope.begin()` work well naturally with coroutines. I mean, many coroutines are async functions. (3) Or can we simply don't replace the use of `%0` in **ramp** function if it is not dominated by `%llvm.coro.begin()`?
Exactly, but the problem is not restricted only to SEH scopes.
1) Even if SEH scopes issue is handled (for example by changing the dominance of `llvm.coro.begin` over `llvm.seh.scope.begin()` in the front-end, which will break the assertion that the deleter should run on every control flow), due to the fact that in such a setting coro.alloc is also emitted with `invoke` instead of `call`, it reproduces the bug on its own unwind path (namely coro.alloc reproduces the bug even without seh scopes).
2) The case we discuss on is during the `%llvm.coro.begin()`, namely - the creation of the coroutine frame (ramp function). in async functions, the coro-begin will happen in the ramp function, and all other instances will be post-ramp-function.
Once the coro frame is allocated successfully, the spills are already mounted to the coro frame, therefore deleter calls (happy/unwinding) should be safe to use later-on.
3) I think it's unsafe to do it. The insertSpills function is indifferent to the usage of the spills, and it can't differentiate cleanups from other uses. Changing spill logics in the ramp function will cause all spills logics after coro.begin to be done on the ramp copy of the spills. I think that in such scenario, it will work ok on the unwinding case, but spills' data might not be consistent between the ramp and post-suspend in happy flow, which is a corruption in practice.
If you aim to do the fix only on the dominance violator nodes - I think I tried to implement it that way at first, but nodes will be optimized to be reused, therefore a ramp-copy vs frame-copy corruption can still occur if the fix code spares spills reload in nodes which are common to happy flow and unwind flow.
My solution is aimed to generate nodes only to the unwind case (it can be enforced), therefore it's corruption-safe for the happy flow.
https://github.com/llvm/llvm-project/pull/149691
More information about the llvm-commits
mailing list