[PATCH] D142620: [Coroutines] Improve rematerialization stage
David Stuttard via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 6 10:40:02 PST 2023
dstuttard added a comment.
Thanks for the feedback - see the comments and the udpated patch(es)
================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:326
+ Instruction *Node;
+ SmallVector<RematNode *> Children;
+ RematNode() = default;
----------------
ChuanqiXu wrote:
> What does `Children` mean here?
I just needed a reasonable name for the next nodes - they are defined as being one edge further away from the root of the graph, so seemed like a reasonable name to use.
Do you think something else would be better?
================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:360
+ if (materializable(*D) &&
+ Checker.isDefinitionAcrossSuspend(*D, FirstUse)) {
+ if (Remats.count(D)) {
----------------
jsilvanus wrote:
> Maybe the indentation here can be reduced a bit with early exiting out of the outermost if, moving the initialization of D out of the if, merging the two ifs above, and early exiting here as well?
I think I get what you mean - I've updated it with less indenting.
================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:366-372
+ for (auto &I : WorkList) {
+ if (I->Node == D) {
+ NoMatch = false;
+ N->Children.push_back(I.get());
+ break;
+ }
+ }
----------------
sebastian-ne wrote:
> Maybe it makes sense to use a Set for the Worklist?
Maybe - are you thinking that a set would remove the need to check for duplicates? I'm not sure it makes things much better - maybe it removes the needs to iterate the worklist, I can't remember if there's a requirement to do this in order though.
================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:387-388
+
+ inline RematNode **child_begin(RematNode *N) { return N->Children.begin(); }
+ inline RematNode **child_end(RematNode *N) { return N->Children.end(); }
+
----------------
ChuanqiXu wrote:
> Are the 2 methods used?
No - it appears they aren't!
Based on the examples for using the RPOT template I thought they were.
================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:2225-2226
// For every use of the value that is across suspend point, recreate that value
// after a suspend point.
+static void rewriteMaterializableInstructions(
----------------
ChuanqiXu wrote:
> The comment looks not precise after we land this patch.
I'm not sure that the result of this patch is any different from what happened before - other than you might get more than 4 dependent instructions rematerialized.
What do you think needs changing here?
================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:2248
+ RematGraph *RG = E.second.get();
+ ReversePostOrderTraversal<RematGraph *> RPOT(RG);
+ using rpo_iterator = ReversePostOrderTraversal<RematGraph *>::rpo_iterator;
----------------
ChuanqiXu wrote:
> jsilvanus wrote:
> > ChuanqiXu wrote:
> > > It is expensive to create ReversePostOrderTraversal. So it looks not good to construct it in a loop.
> > Pedantically speaking, I'm not sure constructing the `ReversePostOrderTraversal` in a loop here is an issue: It being "expensive" just means it does the graph traversal in the constructor, so its run time is linear in the size of the graph.
> > But here we are using it to traverse *different* graphs, all of which have been constructed before, so the runtime can be amortized into the construction of those graphs, or also into the traversal that is done later.
> >
> > What we should not do is re-creating `ReversePostOrderTraversal` iterator objects for the same graph in a loop, because that wastes runtime.
> >
> > Still, one might argue that constructing all those graphs with overlapping nodes, i.e. possibly multiple graphs having a node for the same `Instruction*`, is a fundamental runtime issue. Not sure if that really can become an issue?
> Yeah, the key point here is that how many overlapping nodes there is. Have you measured the compile-time, run time performance or memory usages? Then we can have a better feeling. For example, we can decide if we want to limit the depth of the graph then.
This work was done to speed up materialization. We needed a lot of rematerialization to happen, and initially just increased the number of iterations from 4 to a larger number.
This didn't work very well and was extremely slow - hence this re-work.
I haven't done timings for smaller amounts of remat, but I can do that if you think it is useful.
I did wonder though if limiting the depth with an option might be useful - we want as much as possible, but that's probably not true for all applications.
I'm not sure about the overlapping nodes actually being an issue here - I did attempt to create a test case that demonstrated this, but I'm not sure I was entirely successful (all the tests ended up with the minimum set for the instructions being rematerialized).
================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:2901-2907
+ // Manually add dbg.value metadata uses of I.
+ SmallVector<DbgValueInst *, 16> DVIs;
+ findDbgValues(DVIs, &I);
+ for (auto *DVI : DVIs)
+ if (Checker.isDefinitionAcrossSuspend(I, DVI))
+ Spills[&I].push_back(DVI);
+ }
----------------
ChuanqiXu wrote:
> This is not good. It may cause the the behavior become inconsistent after we materialize DVI instructions. See https://github.com/llvm/llvm-project/issues/55276 for an example.
I think this is here because I created the original patch on an older version of CoroFrame which did this.
Is removing this the right approach?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D142620/new/
https://reviews.llvm.org/D142620
More information about the llvm-commits
mailing list