[PATCH] D59462: [WebAssembly] Optimize the number of routing blocks in FixIrreducibleCFG

Mon Mar 18 19:13:27 PDT 2019

aheejin marked an inline comment as done.
aheejin added a comment.

In D59462#1433718 <https://reviews.llvm.org/D59462#1433718>, @kripken wrote:

> Nice idea, and overall looks good to me. I didn't quite understand the TII part though.

`TargetInstrInfo::analyzeBranch` function's output is a bit complicated; we have to check the return value and three parameters that are passed (`TBB`, `FBB`, and `Cond`). The detailed description on the meaning of its output is here <https://github.com/llvm/llvm-project/blob/0200d62ec7a54b79313bc2c4ccbe51c2a8c5e940/llvm/include/llvm/CodeGen/TargetInstrInfo.h#L546-L578>. The default return value `true` means (not intuitively) the branch is not analyzable, and each target is supposed to override this function. Our overridden version is here <https://github.com/llvm/llvm-project/blob/0200d62ec7a54b79313bc2c4ccbe51c2a8c5e940/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp#L98-L158>.

Before, we add a routing block for each predecessor / entry combination, so if in the original code a predecessor falls through to an entry, we placed a routing block right there where the entry block was, maintaining the fall-through relationship, so that in the new code the predecessor falls through to the new routing block. But after this patch we don't create a routing block per pred/entry conbination; there can be only one routing block for multiple predecessors, and one of those predecessor might have fell through to the entry in the original code, i.e., it was the layout predecessor of the entry. But we create a routing block when we first encounter any predecessor, so it's possible the new routing block is not created right after the layout predecessor anymore. For example, when the original code looks like this:

  pred0:
    ...
    br label %entry
  pred1:
    ...
    (falls through to %entry)
  entry:
    ...

Before, we created two routing blocks, each for `pred0` and `pred1`, so the result will be like this:

  pred0:
    ...
    br label %routing0
  pred1:
    ...
    (falls through to %routing1)
  routing1:               // This `routing1` is created right after `pred1`, because `pred1` was the layout predecessor of `entry` and fell through to it
    i32.const 0
    br %dispatch
  entry:
    ...
  routing0:
    i32.const 0
    br %dispatch
  dispatch:
   br_table ...

But now we create only one routing block in this case, so we end up with

  pred0:
    ...
    br label %routing
  pred1:
    ...
    br label %routing            <-- We need this new branch
  entry:
    ...
  routing:
    i32.const 0
    br %dispatch
  dispatch:
   br_table ...

We need the new branch, because we already created `routing` when we first encountered `pred0`, and for `pred1` we need a branch. Ideally we can do some more optimization here by first checking if there is any layout predecessor among all predecessors and create the routing block right after that, but all of these can change anyway in CFGSort, so I'm not sure how much benefit it will give.

Anyway, so for the code:

  // In case the predecessor fell through to the successor in the original
  // CFG and and the new routing block is not the predecessor's layout
  // successor, we need to add a branch to the new routing block.
  MachineBasicBlock *TBB = nullptr, *FBB = nullptr;
  SmallVector<MachineOperand, 4> Cond;
  bool Analyzable = !TII.analyzeBranch(*Pred, TBB, FBB, Cond);
  if (Analyzable &&
      ((Cond.empty() && !TBB && !FBB) ||
       (!Cond.empty() && !FBB && Pred->isLayoutSuccessor(Succ))))
    BuildMI(Pred, DebugLoc(), TII.get(WebAssembly::BR)).addMBB(Routing);

- `Cond.empty() && !TBB && !FBB` is for when we don't have any branches and just fall through to the next block.
- `!Cond.empty() && !FBB && Pred->isLayoutSuccessor(Succ)` is for when we only have a conditional branch in the block and if that condition is not taken we fall through to the next block.

> Btw, are there no backend passes that would do this optimization anyhow (merge identical branches)?

There are, but all of them run before this pass, because this pass is the last pass in the wasm compilation pipeline that modifies CFG. (which is not strictly true, because we have to modify CFG in CFGStackify to resolve things for exception handling, but I mean in general.)

================
Comment at: lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp:390
+  // This map stores whether this predecessor is within this loop.
+  DenseMap<MachineBasicBlock *, bool> InLoop;
+  for (auto *Pred : AllPreds) {
----------------
kripken wrote:
> What is the advantage of a map to bool over a set?
None. It should've been a set. I think I was changing the implementation this way and that and somehow ended up with that weird map. :( Thanks for the catch.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59462/new/

https://reviews.llvm.org/D59462