[llvm-dev] [RFC] Coroutines passes in the new pass manager

Thu Dec 26 06:29:40 PST 2019

Hello all,

It's been a month since my previous email on the topic, and since then
I've done some initial work on porting the coroutines passes to the
new pass manager. In total there are 6 patches -- that's a lot to
review, so allow me to introduce the changes being made in each of
them.

# What's finished

In these first 6 patches, I focused on lowering coroutine intrinsics
correctly. With the patches applied, Clang is able to successfully use
the new pass manager to build and test a major open source C++20
coroutines library, https://github.com/lewissbaker/cppcoro.

1. https://reviews.llvm.org/D71898
New pass manager implementation of the coro-early function pass, with
LLVM regression tests for this pass updated to test both the new and
legacy implementations.

2. https://reviews.llvm.org/D71899
Same thing, but for coro-split CGSCC pass. This patch adds support to
the new pass manager for only the C++20 switch-based coroutines ABI.
I'd like to implement support for the Swift returned-continuation ABI
in a future patch.

3. https://reviews.llvm.org/D71900
4. https://reviews.llvm.org/D71901
Same thing, but for the coro-elide and coro-cleanup function passes.

5. https://reviews.llvm.org/D71902
The first 4 patches allow users to run coroutine passes by invoking,
for example 'opt -passes=coro-early'. However, most of LLVM's tests
for coroutines use 'opt -enable-coroutines', which adds all 4
coroutines passes, in the correct order, to the pass pipeline. This
5th patch adds a similar feature but in a way that can be used by the
new pass manager: a pipeline parser that understands 'opt
-passes=coroutines'.

6. https://reviews.llvm.org/D71903
Finally, this patch modifies Clang to run the new coroutine passes
when the experimental pass manager is being used with coroutines
enabled (either via '-fcoroutines-ts' or '-std=c++2a'). With all 6
patches applied, the cppcoro library builds and tests successfully
with a Clang built with 'ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=On'.

# What's yet to be done: Swift "returned-continuation" coroutines &
the "devirtualization trigger"

Two things are missing from these initial 6 patches: the first is, as
I mentioned above, support for returned-continuation coroutines, which
are used by Swift. I think this will be fairly straightforward for me
(or others, if interested) to add in an upcoming patch.

The second missing feature has to do with how CGSCC passes are re-run
by the pass manager infrastructure.

The legacy coro-split and coro-elide passes work in tandem: the
coro-split pass first introduces an indirect call to a dummy
"devirtualization trigger" function, and then the coro-elide pass
devirtualizes it. The legacy CGSCC pass manager checks for
devirtualizations after each pass and, if any occur, it re-runs the
CGSCC pass pipeline. In other words: coro-split is run, a check is
made for devirtualization, then coro-elide is run, and another check
is made.

The new pass manager allows for a series of CGSCC passes to be wrapped
in a DevirtSCCRepeatedPass, but this only checks for devirtualizations
after all passes are run. In the case of coro-split and coro-elide,
the indirect function call is added and then devirtualized within a
single pass of the repeater, so DevirtSCCRepeatedPass never sees the
devirtualization and thus doesn't perform a repeat iteration. In other
words: a check is made, coro-split is run and it adds an indirect
call, coro-elide is run and it devirtualizes that call, and then
another check is made. From the repeater pass's point of view, nothing
has changed.

This is something I'd like to tweak in future patches (my thinking is
to add a member to CGSCCUpdateResult to allow passes to manually
inform the pass manager about devirtualizations, but I'm very open to
alternative ideas). But for now, I simply have Clang manually schedule
two iterations of the coro-split pass, rather than have it rely on the
repeater detecting a devirtualization and automatically scheduling
another coro-split iteration. As a result, the new pass manager
implementation of coroutines realizes correct program behavior, but
fails to realize some of the optimization patterns that are tested for
in the regression tests in 'llvm/test/Transforms/Coroutines'.

I'd greatly appreciate code review on my patches! Or, please reply
here with any questions/comments.

- Brian Gesiak

On Mon, Nov 25, 2019 at 8:39 PM Brian Gesiak <modocache at gmail.com> wrote:
>
> Hi all!
>
> I'm working on porting the LLVM passes for C++20 coroutines over to
> the new pass manager infrastructure. Of the 4 passes, 3 are function
> passes, and so porting them is straightforward and easy (my thanks to
> those involved -- the design is great!). However, I'm struggling with
> 'coro-split', which is an SCC pass. Specifically, I'd like advice on
> how to appropriately update the new pass manager's preferred
> representation of the SCC: 'llvm::LazyCallGraph'.
>
> Before I ask my specific questions about 'LazyCallGraph', it may help
> to explain my understanding of the 'coro-split' pass and how it
> modifies the call graph:
>
> The coro-split pass "clones" coroutine functions. For example, for a
> coroutine function 'foo', it creates declarations for 3 new functions
> ('foo.resume', 'foo.destroy', and 'foo.cleanup') using the static
> member function 'llvm::Function::Create'. It then uses
> 'llvm::CloneFunctionInto' to copy the 'foo' function's attributes,
> arguments, basic blocks, and instructions, into each of the 3 new
> functions. Finally, the coro-split pass replaces the entry basic
> blocks of the function to read a value from a global store of
> coroutine state called the coroutine frame, and uses that value to
> determine which part of the cloned coroutine function should be
> executed upon "resumption" of the coroutine.
>
> Of course, once the coro-split SCC pass has done all this, it must
> update LLVM's representation of the call graph. It does so using ~40
> lines of code that you may read here:
> https://github.com/llvm/llvm-project/blob/890c6ef1fb/llvm/lib/Transforms/Coroutines/Coroutines.cpp#L186-L224
>
> To explain my understanding of the code in the above link: the
> coro-split pass completely re-initializes the 'CallGraphSCC' object,
> using the member function
> 'CallGraphSCC::initialize(ArrayRef<CallGraphNode*>)'. The array of
> nodes it uses to re-initialize the SCC includes nodes for each of the
> 3 new functions, which it adds to the call graph using
> 'CallGraph::getOrInsertFunction'. For each of the new nodes, and for
> the node representing the original function, the coro-split pass
> iterates over each of the instructions in the function, and uses
> 'CallGraphNode::addCalledFunction' to add edges to the call graph.
>
> The difficulty I'm having here in porting coro-split to the new pass
> manager is that its SCC passes use 'LazyCallGraph' and
> 'LazyCallGraph::SCC'. These classes' documentation explains that they
> are designed with the constraint that optimization passes shall not
> delete, remove, or add edges that invalidate a bottom-up traversal of
> the SCC DAG. Unfortunately, I understand the coro-split pass to be
> doing exactly those things.
>
> As a result, if I attempt to mimic the coro-split pass's logic by
> inserting functions into the call graph using 'LazyCallGraph::get',
> and then adding call edges with
> 'LazyCallGraph::RefSCC::insertTrivialRefEdge' and
> 'LazyCallGraph::RefSCC::switchInternalEdgeToCall', I'm met with an
> assertion: llvm/lib/Analysis/CGSCCPassManager.cpp:463: [...]:
> Assertion `E && "No function transformations should introduce *new* "
> "call edges! Any new calls should be modeled as " "promoted existing
> ref edges!"' failed.
>
> Does anyone have any suggestions on how I can better work within the
> constraints of the 'LazyCallGraph'?
>
> In case it helps, you can see the code that currently hits this
> assertion here:
> https://github.com/modocache/llvm-project/commit/02c10528e9. Of
> particular interest may be the functions 'buildLazyCallGraphNode' and
> 'updateCallGraphAfterSplit'.
>
> One idea a colleague of mine suggested was to have an earlier
> coroutine function pass insert declarations of 'foo.resume',
> 'foo.destroy', and 'foo.cleanup', which we could then promote to call
> edges. However, they also found this FIXME in CGSCCPassManager.cpp:
> "We should really handle adding new calls. While it will make
> downstream usage more complex, there is no fundamental limitation and
> it will allow passes within the CGSCC to be a bit more flexible in
> what transforms they can do. Until then, we verify that new calls
> haven't been introduced."
>
> As a result, I'm now unsure whether I ought to modify coro-split's
> implementation for the new pass manager, or modify 'CGSCCPassManager'
> to allow for the insertion of new calls. Any and all advice would be
> greatly appreciated!
>
> - Brian Gesiak