[PATCH] D58919: WebAssembly: Irreducible control flow rewrite

Tue Mar 12 13:52:51 PDT 2019

kripken marked 17 inline comments as done.
kripken added a comment.

> What changes fixed the previous bug?

Basically the rewrite avoids the entire previous approach of "canonicalization", of representing an inner loop by its entry. When we got that wrong, we could reach very wrong results.

================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:30
+///   O(NumBlocks * NumNestedLoops * NumIrreducibleLoops +
+///     NumLoops * NumLoops)
 ///
----------------
aheejin wrote:
> Why is this?
The NumLoops^2 part? To find mutual loop entries, that is, loop entries that can reach each other, we compare each one to the others.

This is actually just the square of the number of loops in a single scope, so nested loops don't count. Anyhow, even in the "worst case" which is a loop after a loop after a loop, this factor is going to be quite small, I'd guess.

================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:167
+    // Find the loop entries - loopers reachable from non-loopers - and their
+    // loop enterers.
+    for (auto *Looper : Loopers) {
----------------
aheejin wrote:
> Does this definition cover when `Entry` is an entry (or header) of a loop? I'm wondering because the reachability computation here does not consider backedges to `Entry`, so it will not be discovered as a looper.
> 
> Also does it cover this case?
> ```
> bb0:
>   br bb1
> bb1:
>   br_if bb0
>   br bb2
> bb2:
>   br bb3
> bb3:
>   br bb2
> ```
> 
> So here bb0 and bb1 comprise a loop and bb2 and bb3 comprise another loop. And all four blocks are loopers, but bb0 and bb2 are loop entries. But they are not reachable from any of non-loopers, because there are no non-loopers in this CFG.
> (Also here bb0 is the header of a loop so it belongs to the first case where `Entry` is a header)
When we look at a region, the entry block cannot be a looper, which simplifies things. LLVM doesn't let a function entry be in a loop (maybe for similar reasons as why it's convenient here?), and when we recurse into a loop we ignore backedges to Entry intentionally (so we see the inner part of the loop, and not the looping).

In other words, if a block is a looper, it will be seen as such in the outer scope it is in (where it is not the entry). When it is the entry into a region, and we are looking at just that region, it cannot be a looper there.

For the second issue (with that example): yeah, the comment was not clear enough, I'll update it. They key is that a loop enterer is a block not in the same loop, that enters it. So here bb2 is a looper, and bb1 is a predecessor of it, and not in the same loop (bb2 can't reach bb1), so we identify bb2 as a loop entry and bb1 as a loop enterer for it.

================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:201
+    // from outside, we will traverse all the blocks in the loop.
+    BlockSet WorkList;
+    Blocks.insert(Entry);
----------------
aheejin wrote:
> Why is this worklist a set?
As a set, it won't have duplicate entries - so if we visit A that causes us to add C to the future work, and then visit B that also causes us to add C to the future work, then when we get to C we'll visit it once, and avoid visiting it again later (and not having anything to do the second time).

I haven't benchmarked this recently, but I remember it was beneficial in the Relooper.

================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:222
+
+class LoopFixer {
+public:
----------------
aheejin wrote:
> Do we need to make this as a separate class now? Unlike the previous version, we create `LoopFixer` object only once in `WebAssemblyFixIrreducibleControlFlow::runOnMachineFunction`. I think we can add paste the body of `LoopFixer::run` into `WebAssemblyFixIrreducibleControlFlow::runOnMachineFunction` and make `LoopFixer:: processRegion` as a member function of `WebAssemblyFixIrreducibleControlFlow` that returns a bool.
Good idea, thanks!

================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:270
+        }
       }
+      // Only go on to actually process the inner loops when we are done
----------------
aheejin wrote:
> I might be mistaken, but the algorithm seems to not care about whether loop entries found are within the same level loop. For example, the case where there are two-level nested loops and block A is one of the entries to the outer loop and B is one of the entries for the inner loop. Even in this case, we don't distinguish them and resolve these with a single pass with a single dispatch block. Is this OK and intended? And for the 'mutual entries' above, can't they also be not in the same loop but each in an outer loop and an inner loop?
Yeah, it's somewhat debatable (is an inner loop an "inner loop" if there's a branch going into it from several levels up?), but yes, when we see irreducibility we handle it now, instead of leaving it for the inner loop. We have to, I think - if we can branch to it now, we need to fix up that branch now - we wouldn't see it later when we recurse and ignore the outside. At least that would require a very different approach I suppose (which might be more optimal in some cases).

================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:275
         continue;
       }

----------------
aheejin wrote:
> - How about reducing depth of the subloop processing routine below by wrapping this loop here early, like
> ```
> while (FoundIrreducibility) {
>   ReachabilityGraph Graph(Entry, Blocks);
> 
>   bool FoundIrreducibility = false;
>   for (auto *LoopEntry : Graph.getLoopEntries()) {
>     ...
>   }
> }
> 
> for (auto *LoopEntry : Graph.getLoopEntries()) {
>   ...
> }
> ```
> 
> - Btw, is there any case we can find irreducibility again with the same region //before// we recurse into inner loops below? If there is, how is that case different from the case that's processed below on the inner loops?
The problem is that the Graph is used outside of the loop in your code. I agree it's a little awkward as it is... another option is to use an std::unique_ptr to allow using it from the outside. But that's less clear I think.

Yes, we might find irreducibility again if we have two irreducible loops one after the other (i.e. not nested). It's *possible* we don't need to look at the irreducible one we've already fixed, however, I don't have a proof of that. And we do need to recompute the Graph again after fixing one loop (since it can affect others; and trying to update it is fairly complex an optimization), so starting another loop iteration is the simple thing. The cost is another iteration per irreducible loop, which is not that bad since they are rare.

I'll try to improve the comment here.

================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:286
+        processRegion(LoopEntry, InnerBlocks.getBlocks());
       }

----------------
aheejin wrote:
> In which case do we need to recurse into inner loops? (Given that we didn't distinguish inner loops and outer loops above) Could you add some comments?
Good point, this is important to explain, adding.

================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:347
+        continue;
+      }

----------------
aheejin wrote:
> Is there any case `!Pair.second` is not set? Aren't all blocks in `SortedEntries` distinct?
This is code not changed in this patch (just some variable renaming), but I think you're right, fixing.

================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:371
+      DenseMap<MachineBasicBlock *, MachineBasicBlock *> Map;
+      for (auto *Entry : Pred->successors()) {
+        if (!Entries.count(Entry)) {
----------------
aheejin wrote:
> This loop does not seem to check if a corresponding `Split` block for `Entry` has been already created. In case `Entry` has more than one predecessors, `Split` for `Entry` is gonna be created multiple times and `Map[Entry]` will be overwritten. We should do something like this inside the loop:
> ```
> if (Map.count(Entry))
>   continue;
> ```
This is code that did not change in this patch (aside from names and clang-format), and I'm not sure I fully understand it (I think Dan wrote it?). But I don't think `Map[Entry]` can be overwritten? The shape of the code is
```
DenseMap<MachineBasicBlock *, MachineBasicBlock *> Map;
for (auto *Entry : Pred->successors()) {
  [..]
  Map[Entry] = Split;
}
```

================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:378
+        MachineBasicBlock *Split = MF.CreateMachineBasicBlock();
+        MF.insert(Pred->isLayoutSuccessor(Entry)
+                      ? MachineFunction::iterator(Entry)
----------------
aheejin wrote:
> Did you mean
> ```
> Entry->isLayoutSuccessor(Pred)
> ```
> ? In the current state all new blocks are gonna be appended at the end, because this condition is hardly likely to be satisfied.
As before, this is Dan's code that I didn't change here (aside from variable names). I'm not sure what it does...

================
Comment at: test/CodeGen/WebAssembly/irreducible-cfg.ll:224
+; CHECK-NOT: br_table
+define hidden void @ps_hints_apply() #0 {
+entry:
----------------
aheejin wrote:
> - If this is without irreducibility, shouldn't this be in `non-irreducible-cfg.ll` below?
> - Can we remove `#0`?
* The other file has -O0, and this one doesn't specify an opt level (so it's -O2 I guess?), so I kept them separate.
* Removed `#0`.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58919/new/

https://reviews.llvm.org/D58919