[PATCH] D142620: [Coroutines] Improve rematerialization stage

Mon Feb 6 18:31:13 PST 2023

ChuanqiXu added inline comments.

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:324
+namespace {
+struct RematNode {
+  Instruction *Node;
----------------
Since `RematNode` is only used in `RematGraph`. I feel slightly better to make `RematNode ` a private class definition for `RematGraph`.

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:339
+  RematGraph(Instruction *I, SuspendCrossingInfo &Checker) : Checker(Checker) {
+    std::unique_ptr<RematNode> FirstNode = std::make_unique<RematNode>(I);
+    EntryNode = FirstNode.get();
----------------
I feel slightly better to add an assertion here.

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:388
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  void dump() const;
+#endif
----------------
nit: It looks slightly better to provide a in-class definition for `dump()`. So we can reduce one `!defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)`.

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:326
+  Instruction *Node;
+  SmallVector<RematNode *> Children;
+  RematNode() = default;
----------------
dstuttard wrote:
> ChuanqiXu wrote:
> > What does `Children` mean here?
> I just needed a reasonable name for the next nodes - they are defined as being one edge further away from the root of the graph, so seemed like a reasonable name to use.
> Do you think something else would be better?
I guess `operands` or `reversed_successor` or something similar to that may be better. What's more important here is that we lack a lot of comments here for `RematNode` and `RematGraph`. Otherwise the code readers can't understand what they mean.  

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:360
+          if (materializable(*D) &&
+              Checker.isDefinitionAcrossSuspend(*D, FirstUse)) {
+            if (Remats.count(D)) {
----------------
dstuttard wrote:
> jsilvanus wrote:
> > Maybe the indentation here can be reduced a bit with early exiting out of the outermost if, moving the initialization of D out of the if, merging the two ifs above, and early exiting here as well?
> I think I get what you mean - I've updated it with less indenting.
We can still improve this. For example:

```
if (Remats.count(N->Node)) 
    return;
```

```
if (Remats.count(D)) {
          // Already have this in the graph
          N->Children.push_back(Remats[D].get());
          continue;
}

```

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:366-372
+              for (auto &I : WorkList) {
+                if (I->Node == D) {
+                  NoMatch = false;
+                  N->Children.push_back(I.get());
+                  break;
+                }
+              }
----------------
dstuttard wrote:
> sebastian-ne wrote:
> > Maybe it makes sense to use a Set for the Worklist?
> Maybe - are you thinking that a set would remove the need to check for duplicates? I'm not sure it makes things much better - maybe it removes the needs to iterate the worklist, I can't remember if there's a requirement to do this in order though.
Given the graph should be a DAG, the order should not be important here. So a set here may be better.

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:2225-2226

 // For every use of the value that is across suspend point, recreate that value
 // after a suspend point.
+static void rewriteMaterializableInstructions(
----------------
dstuttard wrote:
> ChuanqiXu wrote:
> > The comment looks not precise after we land this patch.
> I'm not sure that the result of this patch is any different from what happened before - other than you might get more than 4 dependent instructions rematerialized.
> What do you think needs changing here?
The comment is `For every use of "the value"`. However, rewriteMaterializableInstructions don't have `the value` from the signature. Also the reader can't know what is `RematGraph`. So it may be hard for users to understand.

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:2248
+    RematGraph *RG = E.second.get();
+    ReversePostOrderTraversal<RematGraph *> RPOT(RG);
+    using rpo_iterator = ReversePostOrderTraversal<RematGraph *>::rpo_iterator;
----------------
dstuttard wrote:
> ChuanqiXu wrote:
> > jsilvanus wrote:
> > > ChuanqiXu wrote:
> > > > It is expensive to create ReversePostOrderTraversal. So it looks not good to construct it in a loop.
> > > Pedantically speaking, I'm not sure constructing the `ReversePostOrderTraversal` in a loop here is an issue: It being "expensive" just means it does the graph traversal in the constructor, so its run time is linear in the size of the graph.
> > > But here we are using it to traverse *different* graphs, all of which have been constructed before, so the runtime can be amortized into the construction of those graphs, or also into the traversal that is done later.
> > > 
> > > What we should not do is re-creating `ReversePostOrderTraversal` iterator objects for the same graph in a loop, because that wastes runtime.
> > > 
> > > Still, one might argue that constructing all those graphs with overlapping nodes, i.e. possibly multiple graphs having a node for the same `Instruction*`, is a fundamental runtime issue. Not sure if that really can become an issue?
> > Yeah, the key point here is that how many overlapping nodes there is. Have you measured the compile-time, run time performance or memory usages? Then we can have a better feeling. For example, we can decide if we want to limit the depth of the graph then.
> This work was done to speed up materialization. We needed a lot of rematerialization to happen, and initially just increased the number of iterations from 4 to a larger number.
> This didn't work very well and was extremely slow - hence this re-work.
> 
> I haven't done timings for smaller amounts of remat, but I can do that if you think it is useful.
> 
> I did wonder though if limiting the depth with an option might be useful - we want as much as possible, but that's probably not true for all applications.
> 
> I'm not sure about the overlapping nodes actually being an issue here - I did attempt to create a test case that demonstrated this, but I'm not sure I was entirely successful (all the tests ended up with the minimum set for the instructions being rematerialized).
> I haven't done timings for smaller amounts of remat, but I can do that if you think it is useful.
>
> I did wonder though if limiting the depth with an option might be useful - we want as much as possible, but that's probably not true for all applications.

I am not sure if it is a good idea to limit the depth too. I mean we need more data to make the decision. Since this change is not a pure win theoretically. There may be edge cases or everything would be fine. I am not blocking this. I just say we need more things to convince ourselves that this is a good change generally. Specially, folly may be a good choice to have a test. Or any other libraries that have a lot of  coroutines.

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:2901-2907
+      // Manually add dbg.value metadata uses of I.
+      SmallVector<DbgValueInst *, 16> DVIs;
+      findDbgValues(DVIs, &I);
+      for (auto *DVI : DVIs)
+        if (Checker.isDefinitionAcrossSuspend(I, DVI))
+          Spills[&I].push_back(DVI);
+    }
----------------
dstuttard wrote:
> ChuanqiXu wrote:
> > This is not good. It may cause the the behavior become inconsistent after we materialize DVI instructions. See https://github.com/llvm/llvm-project/issues/55276 for an example.
> I think this is here because I created the original patch on an older version of CoroFrame which did this.
> Is removing this the right approach?
Yes, we should remove it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142620/new/

https://reviews.llvm.org/D142620