[PATCH] D142620: [Coroutines] Improve rematerialization stage

Mon Feb 6 10:40:02 PST 2023

dstuttard added a comment.

Thanks for the feedback - see the comments and the udpated patch(es)

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:326
+  Instruction *Node;
+  SmallVector<RematNode *> Children;
+  RematNode() = default;
----------------
ChuanqiXu wrote:
> What does `Children` mean here?
I just needed a reasonable name for the next nodes - they are defined as being one edge further away from the root of the graph, so seemed like a reasonable name to use.
Do you think something else would be better?

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:360
+          if (materializable(*D) &&
+              Checker.isDefinitionAcrossSuspend(*D, FirstUse)) {
+            if (Remats.count(D)) {
----------------
jsilvanus wrote:
> Maybe the indentation here can be reduced a bit with early exiting out of the outermost if, moving the initialization of D out of the if, merging the two ifs above, and early exiting here as well?
I think I get what you mean - I've updated it with less indenting.

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:366-372
+              for (auto &I : WorkList) {
+                if (I->Node == D) {
+                  NoMatch = false;
+                  N->Children.push_back(I.get());
+                  break;
+                }
+              }
----------------
sebastian-ne wrote:
> Maybe it makes sense to use a Set for the Worklist?
Maybe - are you thinking that a set would remove the need to check for duplicates? I'm not sure it makes things much better - maybe it removes the needs to iterate the worklist, I can't remember if there's a requirement to do this in order though.

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:387-388
+
+  inline RematNode **child_begin(RematNode *N) { return N->Children.begin(); }
+  inline RematNode **child_end(RematNode *N) { return N->Children.end(); }
+
----------------
ChuanqiXu wrote:
> Are the 2 methods used?
No - it appears they aren't!
Based on the examples for using the RPOT template I thought they were.

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:2225-2226

 // For every use of the value that is across suspend point, recreate that value
 // after a suspend point.
+static void rewriteMaterializableInstructions(
----------------
ChuanqiXu wrote:
> The comment looks not precise after we land this patch.
I'm not sure that the result of this patch is any different from what happened before - other than you might get more than 4 dependent instructions rematerialized.
What do you think needs changing here?

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:2248
+    RematGraph *RG = E.second.get();
+    ReversePostOrderTraversal<RematGraph *> RPOT(RG);
+    using rpo_iterator = ReversePostOrderTraversal<RematGraph *>::rpo_iterator;
----------------
ChuanqiXu wrote:
> jsilvanus wrote:
> > ChuanqiXu wrote:
> > > It is expensive to create ReversePostOrderTraversal. So it looks not good to construct it in a loop.
> > Pedantically speaking, I'm not sure constructing the `ReversePostOrderTraversal` in a loop here is an issue: It being "expensive" just means it does the graph traversal in the constructor, so its run time is linear in the size of the graph.
> > But here we are using it to traverse *different* graphs, all of which have been constructed before, so the runtime can be amortized into the construction of those graphs, or also into the traversal that is done later.
> > 
> > What we should not do is re-creating `ReversePostOrderTraversal` iterator objects for the same graph in a loop, because that wastes runtime.
> > 
> > Still, one might argue that constructing all those graphs with overlapping nodes, i.e. possibly multiple graphs having a node for the same `Instruction*`, is a fundamental runtime issue. Not sure if that really can become an issue?
> Yeah, the key point here is that how many overlapping nodes there is. Have you measured the compile-time, run time performance or memory usages? Then we can have a better feeling. For example, we can decide if we want to limit the depth of the graph then.
This work was done to speed up materialization. We needed a lot of rematerialization to happen, and initially just increased the number of iterations from 4 to a larger number.
This didn't work very well and was extremely slow - hence this re-work.

I haven't done timings for smaller amounts of remat, but I can do that if you think it is useful.

I did wonder though if limiting the depth with an option might be useful - we want as much as possible, but that's probably not true for all applications.

I'm not sure about the overlapping nodes actually being an issue here - I did attempt to create a test case that demonstrated this, but I'm not sure I was entirely successful (all the tests ended up with the minimum set for the instructions being rematerialized).

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:2901-2907
+      // Manually add dbg.value metadata uses of I.
+      SmallVector<DbgValueInst *, 16> DVIs;
+      findDbgValues(DVIs, &I);
+      for (auto *DVI : DVIs)
+        if (Checker.isDefinitionAcrossSuspend(I, DVI))
+          Spills[&I].push_back(DVI);
+    }
----------------
ChuanqiXu wrote:
> This is not good. It may cause the the behavior become inconsistent after we materialize DVI instructions. See https://github.com/llvm/llvm-project/issues/55276 for an example.
I think this is here because I created the original patch on an older version of CoroFrame which did this.
Is removing this the right approach?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142620/new/

https://reviews.llvm.org/D142620