[PATCH] D58919: WebAssembly: Irreducible control flow rewrite

Heejin Ahn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 13 01:58:49 PDT 2019


aheejin added a comment.

- There are several places in the diff that says "Context not available". I think the diff has some problems. (I thought it needed rebase but maybe it's not relevant) Could you check?
- In this new algorithm, all tests in `irreducible-cfg-nested.ll`, `irreducible-cfg-nested2.ll`, and `irreducible-cfg-exceptions.ll` are reducible now, i.e. this pass does not do anything on them. Maybe we can delete them?



================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:167
+    // Find the loop entries - loopers reachable from non-loopers - and their
+    // loop enterers.
+    for (auto *Looper : Loopers) {
----------------
kripken wrote:
> aheejin wrote:
> > Does this definition cover when `Entry` is an entry (or header) of a loop? I'm wondering because the reachability computation here does not consider backedges to `Entry`, so it will not be discovered as a looper.
> > 
> > Also does it cover this case?
> > ```
> > bb0:
> >   br bb1
> > bb1:
> >   br_if bb0
> >   br bb2
> > bb2:
> >   br bb3
> > bb3:
> >   br bb2
> > ```
> > 
> > So here bb0 and bb1 comprise a loop and bb2 and bb3 comprise another loop. And all four blocks are loopers, but bb0 and bb2 are loop entries. But they are not reachable from any of non-loopers, because there are no non-loopers in this CFG.
> > (Also here bb0 is the header of a loop so it belongs to the first case where `Entry` is a header)
> When we look at a region, the entry block cannot be a looper, which simplifies things. LLVM doesn't let a function entry be in a loop (maybe for similar reasons as why it's convenient here?), and when we recurse into a loop we ignore backedges to Entry intentionally (so we see the inner part of the loop, and not the looping).
> 
> In other words, if a block is a looper, it will be seen as such in the outer scope it is in (where it is not the entry). When it is the entry into a region, and we are looking at just that region, it cannot be a looper there.
> 
> For the second issue (with that example): yeah, the comment was not clear enough, I'll update it. They key is that a loop enterer is a block not in the same loop, that enters it. So here bb2 is a looper, and bb1 is a predecessor of it, and not in the same loop (bb2 can't reach bb1), so we identify bb2 as a loop entry and bb1 as a loop enterer for it.
Oh I see. Thanks. I didn't know that function entries cannot be loop headers.


================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:201
+    // from outside, we will traverse all the blocks in the loop.
+    BlockSet WorkList;
+    Blocks.insert(Entry);
----------------
kripken wrote:
> aheejin wrote:
> > Why is this worklist a set?
> As a set, it won't have duplicate entries - so if we visit A that causes us to add C to the future work, and then visit B that also causes us to add C to the future work, then when we get to C we'll visit it once, and avoid visiting it again later (and not having anything to do the second time).
> 
> I haven't benchmarked this recently, but I remember it was beneficial in the Relooper.
But isn't it possible that by the time you try to add C for the second time, the first C has been already erased from the set so you do the unnecessary work again? Usually what I've seen was making worklist as a vector and have a separate set called something like `Visited`, to which visited blocks are added and not erased.


================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:270
+        }
       }
+      // Only go on to actually process the inner loops when we are done
----------------
kripken wrote:
> aheejin wrote:
> > I might be mistaken, but the algorithm seems to not care about whether loop entries found are within the same level loop. For example, the case where there are two-level nested loops and block A is one of the entries to the outer loop and B is one of the entries for the inner loop. Even in this case, we don't distinguish them and resolve these with a single pass with a single dispatch block. Is this OK and intended? And for the 'mutual entries' above, can't they also be not in the same loop but each in an outer loop and an inner loop?
> Yeah, it's somewhat debatable (is an inner loop an "inner loop" if there's a branch going into it from several levels up?), but yes, when we see irreducibility we handle it now, instead of leaving it for the inner loop. We have to, I think - if we can branch to it now, we need to fix up that branch now - we wouldn't see it later when we recurse and ignore the outside. At least that would require a very different approach I suppose (which might be more optimal in some cases).
Is there a condition in the code that we only consider inner loops that have entries from several levels up? I don't think I found one. We just treat all blocks within the region the same way.

So, for example, an outer loop consists of blocks A, B, C, D, E, and F, and among them E and F comprises an inner loop. The outer loop has two entries: A and D from the blocks outside, and the inner loop also has two entries: E and F. E's enterer is C and F's enterer is D. In this case, can we solve this case in one pass with a single dispatch block that branches into all of A, D, E, and F?

I guess I don't understand the algorithm :( Sorry for that. Please correct me if I'm mistaken.


================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:275
         continue;
       }
 
----------------
kripken wrote:
> aheejin wrote:
> > - How about reducing depth of the subloop processing routine below by wrapping this loop here early, like
> > ```
> > while (FoundIrreducibility) {
> >   ReachabilityGraph Graph(Entry, Blocks);
> > 
> >   bool FoundIrreducibility = false;
> >   for (auto *LoopEntry : Graph.getLoopEntries()) {
> >     ...
> >   }
> > }
> > 
> > for (auto *LoopEntry : Graph.getLoopEntries()) {
> >   ...
> > }
> > ```
> > 
> > - Btw, is there any case we can find irreducibility again with the same region //before// we recurse into inner loops below? If there is, how is that case different from the case that's processed below on the inner loops?
> The problem is that the Graph is used outside of the loop in your code. I agree it's a little awkward as it is... another option is to use an std::unique_ptr to allow using it from the outside. But that's less clear I think.
> 
> Yes, we might find irreducibility again if we have two irreducible loops one after the other (i.e. not nested). It's *possible* we don't need to look at the irreducible one we've already fixed, however, I don't have a proof of that. And we do need to recompute the Graph again after fixing one loop (since it can affect others; and trying to update it is fairly complex an optimization), so starting another loop iteration is the simple thing. The cost is another iteration per irreducible loop, which is not that bad since they are rare.
> 
> I'll try to improve the comment here.
I see. I missed the graph computation was there too.


================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:371
+      DenseMap<MachineBasicBlock *, MachineBasicBlock *> Map;
+      for (auto *Entry : Pred->successors()) {
+        if (!Entries.count(Entry)) {
----------------
kripken wrote:
> aheejin wrote:
> > This loop does not seem to check if a corresponding `Split` block for `Entry` has been already created. In case `Entry` has more than one predecessors, `Split` for `Entry` is gonna be created multiple times and `Map[Entry]` will be overwritten. We should do something like this inside the loop:
> > ```
> > if (Map.count(Entry))
> >   continue;
> > ```
> This is code that did not change in this patch (aside from names and clang-format), and I'm not sure I fully understand it (I think Dan wrote it?). But I don't think `Map[Entry]` can be overwritten? The shape of the code is
> ```
> DenseMap<MachineBasicBlock *, MachineBasicBlock *> Map;
> for (auto *Entry : Pred->successors()) {
>   [..]
>   Map[Entry] = Split;
> }
> ```
> 
Ah right, it's not gonna be overwritten. So the code structure is
```
for (MachineBasicBlock *Pred : AllPreds) {
  DenseMap<MachineBasicBlock *, MachineBasicBlock *> Map;
  for (auto *Entry : Pred->successors()) {
    ...
    Map[Entry] = Split;
  }
```
And in case we have entries E0 and E1, and there are predecessors P0 and P1, and the edges are like P0->E0, P0->E1, P1->E0, and P1->E1. So the both preds point to the both entries. In this case, each entry (E0 or E1) is processed twice in the inner for loop. Because `Map` is created within the outer loop, it's not gonna be overwritten, but `Split` for each E0 and E1 is gonna be created twice unnecessarily.

(I know it's not the part of the code that's changed in this CL; but I thought while we're writing the major part of the pass, maybe we can fix other things too. If you don't prefer that maybe I can submit that as a separate patch. Let me know which one you prefer.)


================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:378
+        MachineBasicBlock *Split = MF.CreateMachineBasicBlock();
+        MF.insert(Pred->isLayoutSuccessor(Entry)
+                      ? MachineFunction::iterator(Entry)
----------------
kripken wrote:
> aheejin wrote:
> > Did you mean
> > ```
> > Entry->isLayoutSuccessor(Pred)
> > ```
> > ? In the current state all new blocks are gonna be appended at the end, because this condition is hardly likely to be satisfied.
> As before, this is Dan's code that I didn't change here (aside from variable names). I'm not sure what it does...
(Because code parts have moved, this comment is not where it was anymore, so)
The code is
```
// This is a successor we need to rewrite.                               
MachineBasicBlock *Split = MF.CreateMachineBasicBlock();                 
MF.insert(Pred->isLayoutSuccessor(Entry)                                 
              ? MachineFunction::iterator(Entry)                         
              : MF.end(),                                                
          Split);
```

I think the intention of the original code was, when we insert the `Split` which is to serve as a stepping stone between `Pred` and `Entry`, if currently `Pred` is right before `Entry` in the BB order, we want to insert `Split` between them, which makes a natural order. If not, we just append the new `Split` at the end of the function. I think this bug was there all along for years, and this was all newly created blocks are appended at the end.

I know this is not related to this CL, but if you prefer you can fix it maybe, or I can submit that as a separate patch.


================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:390
+        BuildMI(*Split, Split->end(), DebugLoc(), TII.get(WebAssembly::BR))
+            .addMBB(Dispatch);
+        Split->addSuccessor(Dispatch);
----------------
aheejin wrote:
> Nit: There is a [[ https://github.com/llvm/llvm-project/blob/a946997c2482e4386549ee38b4bb154eb58efbb6/llvm/include/llvm/CodeGen/MachineInstrBuilder.h#L405-L410 | `BuildMI` function ]] that defaults to append instruction at the end, so you can just do
> ```
> BuildMI(Split, DebugLoc(), TII.get(WebAssembly::CONST_I32), Reg)
>     .addImm(Indices[Entry]);                                              
> BuildMI(Split, DebugLoc(), TII.get(WebAssembly::BR)).addMBB(Dispatch);
> ```
> (Note that `Split` does not have `*` anymore)
Ping


================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:230
+  bool processRegion(MachineBasicBlock *Entry, BlockSet &Blocks,
+                     MachineFunction &MF) {
+    bool Changed = false;
----------------
Nit: Can we take out this and `makeSingleEntryLoop` from the class definition? I don't think we want to inline these long functions and we can have one less level of indentation outside. I mean, outside the class,
```
bool WebAssemblyFixIrreducibleControlFlow::processRegion(...) {
  ...
}
```



================
Comment at: test/CodeGen/WebAssembly/irreducible-cfg.ll:224
+; CHECK-NOT: br_table
+define hidden void @ps_hints_apply() #0 {
+entry:
----------------
kripken wrote:
> aheejin wrote:
> > - If this is without irreducibility, shouldn't this be in `non-irreducible-cfg.ll` below?
> > - Can we remove `#0`?
> * The other file has -O0, and this one doesn't specify an opt level (so it's -O2 I guess?), so I kept them separate.
> * Removed `#0`.
- You can use two different commands and check them separately within a single file.
```
; RUN: llc < %s -O0 -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers | FileCheck %s
; RUN: llc < %s -O2 -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers | FileCheck -check-prefix=OPT %s

; OPT-NOT: br_table
define hidden void @ps_hints_apply() {
  ...
}

; CHECK-NOT: br_table
define hidden i32 @_Z15fannkuch_workerPv(i8* %_arg) {
  ...
}
```

[[ https://llvm.org/docs/CommandGuide/FileCheck.html#the-filecheck-check-prefix-option | Here ]] is the manual for this `-check-prefix` option.

- Nit: Can we remove `hidden` too?


================
Comment at: test/CodeGen/WebAssembly/non-irreducible-cfg.ll:12
+; CHECK-NOT: br_table
+define hidden i32 @_Z15fannkuch_workerPv(i8* %_arg) #0 {
+for.cond:                                         ; preds = %entry
----------------
Nit: Can we remove `hidden` and `#0`?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58919/new/

https://reviews.llvm.org/D58919





More information about the llvm-commits mailing list