[llvm] ea8c637 - [WebAssembly] Fix incorrect grouping and sorting of exceptions

Heejin Ahn via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 23 14:55:12 PST 2021


Author: Heejin Ahn
Date: 2021-02-23T14:54:55-08:00
New Revision: ea8c6375e3330f181105106b3adb84ff9fa76a7c

URL: https://github.com/llvm/llvm-project/commit/ea8c6375e3330f181105106b3adb84ff9fa76a7c
DIFF: https://github.com/llvm/llvm-project/commit/ea8c6375e3330f181105106b3adb84ff9fa76a7c.diff

LOG: [WebAssembly] Fix incorrect grouping and sorting of exceptions

This CL is not big but contains changes that span multiple analyses and
passes. This description is very long because it tries to explain basics
on what each pass/analysis does and why we need this change on top of
that. Please feel free to skip parts that are not necessary for your
understanding.

---

`WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next
unwind destination>. The value (unwind dest) here is where an exception
should end up when it is not caught by the key (EH pad). We record this
info in WasmEHPrepare to fix catch mismatches, because the CFG itself
does not have this info. A CFG only contains BBs and
predecessor-successor relationship between them, but in `WasmEHFuncInfo`
the unwind destination BB is not necessarily a successor or the key EH
pad BB. Their relationship can be intuitively explained by this C++ code
snippet:
```
try {
  try {
    foo();
  } catch (int) { // EH pad
    ...
  }
} catch (...) {   // unwind destination
}
```
So when `foo()` throws, it goes to `catch (int)` first. But if it is not
caught by it, it ends up in the next unwind destination `catch (...)`.
This unwind destination is what you see in `catchswitch`'s
`unwind label %bb` part.

---

`WebAssemblyExceptionInfo` groups exceptions so that they can be sorted
continuously together in CFGSort, as we do for loops. What this analysis
does is very simple: it creates a single `WebAssemblyException` per EH
pad, and all BBs that are dominated by that EH pad are included in this
exception. We also identify subexception relationship in this way: if
EHPad A domiantes EHPad B, EHPad B's exception is a subexception of
EHPad A's exception.

This simple rule turns out to be incorrect in some cases. In
`WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means
semantically EHPad B should not be included in EHPad A's exception,
because it does not make sense to rethrow/delegate to an inner scope.
This is what happened in CFGStackify as a result of this:
```
try
  try
  catch
    ...   <- %dest_bb is among here!
  end
delegate %dest_bb
```

So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to
make sure excptions' unwind destinations are not subexceptions of
their unwind sources in `WasmEHFuncInfo`.

But this alone does not prevent `dest_bb` in the example above from
being sorted within the inner `catch`'s exception, even if its exception
is not a subexception of that `catch`'s exception anymore, because of
how CFGSort works, which will be explained below.

---

CFGSort places BBs within the same `SortRegion` (loop or exception)
continuously together so they can be demarcated with `loop`-`end_loop`
or `catch`-`end_try` in CFGStackify.

`SortRegion` is a wrapper for one of `MachineLoop` or
`WebAssemblyException`. `SortRegionInfo` already does some complicated
things because there discrepancies between those two data structures.
`WebAssemblyException` is what we control, and it is defined as an EH
pad as its header and BBs dominated by the header as its BBs (with a
newly added exception of unwind destinations explained in the previous
paragraph). But `MachineLoop` is an LLVM data structure and uses the
standard loop detection algorithm. So by the algorithm, BBs that are 1.
dominated by the loop header and 2. have a path back to its header.
Because of the second condition, many BBs that are dominated by the loop
header are not included in the loop. So BBs that contain `return` or
branches to outside of the loop are not technically included in
`MachineLoop`, but they can be sorted together with the loop with no
problem.

Maybe to relax the condition, in CFGSort, when we are in a `SortRegion`
we allow sorting of not only BBs that belong to the current innermost
region but also BBs that are by the current region header.
(This was written this way from the first version written by Dan, when
only loops existed.) But now, we have cases in exceptions when EHPad B
is the unwind destination for EHPad A, even if EHPad B is dominated by
EHPad A it should not be included in EHPad A's exception, and should not
be sorted within EHPad A.

One way to make things work, at least correctly, is change `dominates`
condition to `contains` condition for `SortRegion` when sorting BBs, but
this will change compilation results for existing non-EH code and I
can't be sure it will not degrade performance or code size. I think it
will degrade performance because it will force many BBs dominated by a
loop, which don't have the path back to the header, to be placed after
the loop and it will likely to create more branches and blocks.

So this does a little hacky check when adding BBs to `Preferred` list:
(`Preferred` list is a ready list. CFGSort maintains ready list in two
priority queues: `Preferred` and `Ready`. I'm not very sure why, but it
was written that way from the beginning. BBs are first added to
`Preferred` list and then some of them are pushed to `Ready` list, so
here we only need to guard condition for `Preferred` list.)

When adding a BB to `Preferred` list, we check if that BB is an unwind
destination of another BB. To do this, this adds the reverse mapping,
`UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB
is an unwind destination, it checks if the current stack of regions
(`Entries`) contains its source BB by traversing the stack backwards. If
we find its unwind source in there, we add the BB to its `Deferred`
list, to make sure that unwind destination BB is added to `Preferred`
list only after that region with the unwind source BB is sorted and
popped from the stack.

---

This does not contain a new test that crashes because of this bug, but
this fix changes the result for one of existing test case. This test
case didn't crash because it fortunately didn't contain `delegate` to
the incorrectly placed unwind destination BB.

Fixes https://github.com/emscripten-core/emscripten/issues/13514.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97247

Added: 
    

Modified: 
    llvm/include/llvm/CodeGen/WasmEHFuncInfo.h
    llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
    llvm/lib/Target/WebAssembly/WebAssemblyCFGSort.cpp
    llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
    llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.h
    llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll
    llvm/unittests/Target/WebAssembly/WebAssemblyExceptionInfoTest.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/include/llvm/CodeGen/WasmEHFuncInfo.h b/llvm/include/llvm/CodeGen/WasmEHFuncInfo.h
index adcd16a5003f..fdd0a44ed001 100644
--- a/llvm/include/llvm/CodeGen/WasmEHFuncInfo.h
+++ b/llvm/include/llvm/CodeGen/WasmEHFuncInfo.h
@@ -32,27 +32,42 @@ struct WasmEHFuncInfo {
   // When there is an entry <A, B>, if an exception is not caught by A, it
   // should next unwind to the EH pad B.
   DenseMap<BBOrMBB, BBOrMBB> SrcToUnwindDest;
+  DenseMap<BBOrMBB, BBOrMBB> UnwindDestToSrc; // reverse map
 
   // Helper functions
   const BasicBlock *getUnwindDest(const BasicBlock *BB) const {
     return SrcToUnwindDest.lookup(BB).get<const BasicBlock *>();
   }
+  const BasicBlock *getUnwindSrc(const BasicBlock *BB) const {
+    return UnwindDestToSrc.lookup(BB).get<const BasicBlock *>();
+  }
   void setUnwindDest(const BasicBlock *BB, const BasicBlock *Dest) {
     SrcToUnwindDest[BB] = Dest;
+    UnwindDestToSrc[Dest] = BB;
   }
   bool hasUnwindDest(const BasicBlock *BB) const {
     return SrcToUnwindDest.count(BB);
   }
+  bool hasUnwindSrc(const BasicBlock *BB) const {
+    return UnwindDestToSrc.count(BB);
+  }
 
   MachineBasicBlock *getUnwindDest(MachineBasicBlock *MBB) const {
     return SrcToUnwindDest.lookup(MBB).get<MachineBasicBlock *>();
   }
+  MachineBasicBlock *getUnwindSrc(MachineBasicBlock *MBB) const {
+    return UnwindDestToSrc.lookup(MBB).get<MachineBasicBlock *>();
+  }
   void setUnwindDest(MachineBasicBlock *MBB, MachineBasicBlock *Dest) {
     SrcToUnwindDest[MBB] = Dest;
+    UnwindDestToSrc[Dest] = MBB;
   }
   bool hasUnwindDest(MachineBasicBlock *MBB) const {
     return SrcToUnwindDest.count(MBB);
   }
+  bool hasUnwindSrc(MachineBasicBlock *MBB) const {
+    return UnwindDestToSrc.count(MBB);
+  }
 };
 
 // Analyze the IR in the given function to build WasmEHFuncInfo.

diff  --git a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
index b5f2c1545827..4ab29edfa351 100644
--- a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
@@ -341,6 +341,13 @@ void FunctionLoweringInfo::set(const Function &fn, MachineFunction &mf,
       NewMap[MBBMap[Src]] = MBBMap[Dst];
     }
     EHInfo.SrcToUnwindDest = std::move(NewMap);
+    NewMap.clear();
+    for (auto &KV : EHInfo.UnwindDestToSrc) {
+      const auto *Src = KV.first.get<const BasicBlock *>();
+      const auto *Dst = KV.second.get<const BasicBlock *>();
+      NewMap[MBBMap[Src]] = MBBMap[Dst];
+    }
+    EHInfo.UnwindDestToSrc = std::move(NewMap);
   }
 }
 

diff  --git a/llvm/lib/Target/WebAssembly/WebAssemblyCFGSort.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyCFGSort.cpp
index eb3e9b91d40d..1ada9394c860 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyCFGSort.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyCFGSort.cpp
@@ -29,6 +29,7 @@
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/Passes.h"
+#include "llvm/CodeGen/WasmEHFuncInfo.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
 using namespace llvm;
@@ -218,6 +219,7 @@ static void sortBlocks(MachineFunction &MF, const MachineLoopInfo &MLI,
                 CompareBlockNumbersBackwards>
       Ready;
 
+  const auto *EHInfo = MF.getWasmEHFuncInfo();
   SortRegionInfo SRI(MLI, WEI);
   SmallVector<Entry, 4> Entries;
   for (MachineBasicBlock *MBB = &MF.front();;) {
@@ -245,8 +247,33 @@ static void sortBlocks(MachineFunction &MF, const MachineLoopInfo &MLI,
         if (SuccL->getHeader() == Succ && SuccL->contains(MBB))
           continue;
       // Decrement the predecessor count. If it's now zero, it's ready.
-      if (--NumPredsLeft[Succ->getNumber()] == 0)
+      if (--NumPredsLeft[Succ->getNumber()] == 0) {
+        // When we are in a SortRegion, we allow sorting of not only BBs that
+        // belong to the current (innermost) region but also BBs that are
+        // dominated by the current region header. But we should not do this for
+        // exceptions because there can be cases in which, for example:
+        // EHPad A's unwind destination (where the exception lands when it is
+        // not caught by EHPad A) is EHPad B, so EHPad B does not belong to the
+        // exception dominated by EHPad A. But EHPad B is dominated by EHPad A,
+        // so EHPad B can be sorted within EHPad A's exception. This is
+        // incorrect because we may end up delegating/rethrowing to an inner
+        // scope in CFGStackify. So here we make sure those unwind destinations
+        // are deferred until their unwind source's exception is sorted.
+        if (EHInfo && EHInfo->hasUnwindSrc(Succ)) {
+          auto *UnwindSrc = EHInfo->getUnwindSrc(Succ);
+          bool IsDeferred = false;
+          for (Entry &E : reverse(Entries)) {
+            if (E.TheRegion->getHeader() == UnwindSrc) {
+              E.Deferred.push_back(Succ);
+              IsDeferred = true;
+              break;
+            }
+          }
+          if (IsDeferred)
+            continue;
+        }
         Preferred.push(Succ);
+      }
     }
     // Determine the block to follow MBB. First try to find a preferred block,
     // to preserve the original block order when possible.

diff  --git a/llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
index c75de7aa207f..dfc135c92664 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
@@ -17,6 +17,7 @@
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/CodeGen/MachineDominanceFrontier.h"
 #include "llvm/CodeGen/MachineDominators.h"
+#include "llvm/CodeGen/WasmEHFuncInfo.h"
 #include "llvm/InitializePasses.h"
 
 using namespace llvm;
@@ -39,12 +40,13 @@ bool WebAssemblyExceptionInfo::runOnMachineFunction(MachineFunction &MF) {
   releaseMemory();
   auto &MDT = getAnalysis<MachineDominatorTree>();
   auto &MDF = getAnalysis<MachineDominanceFrontier>();
-  recalculate(MDT, MDF);
+  recalculate(MF, MDT, MDF);
   return false;
 }
 
 void WebAssemblyExceptionInfo::recalculate(
-    MachineDominatorTree &MDT, const MachineDominanceFrontier &MDF) {
+    MachineFunction &MF, MachineDominatorTree &MDT,
+    const MachineDominanceFrontier &MDF) {
   // Postorder traversal of the dominator tree.
   SmallVector<std::unique_ptr<WebAssemblyException>, 8> Exceptions;
   for (auto DomNode : post_order(&MDT)) {
@@ -56,6 +58,49 @@ void WebAssemblyExceptionInfo::recalculate(
     Exceptions.push_back(std::move(WE));
   }
 
+  // WasmEHFuncInfo contains a map of <catchpad, its next unwind destination>,
+  // which means, if an exception is not caught by the catchpad, it should end
+  // up in the next unwind destination stored in this data structure. (It is
+  // written as catchswitch's 'unwind' destination in ll files.) The below is an
+  // intuitive example of their relationship in C++ code:
+  // try {
+  //   try {
+  //   } catch (int) { // catchpad
+  //      ...          // this catch (int) { ... } is grouped as an exception
+  //   }
+  // } catch (...) {   // next unwind destination
+  // }
+  // (The example is try-catches for illustration purpose, but the unwind
+  // destination can be also a cleanuppad generated by destructor calls.) So the
+  // unwind destination is in the outside of the catchpad's exception.
+  //
+  // We group exceptions in this analysis simply by including all BBs dominated
+  // by an EH pad. But in case the EH pad's unwind destination does not have any
+  // children outside of the exception, that unwind destination ends up also
+  // being dominated by the EH pad and included in the exception, which is not
+  // semantically correct, because it unwinds/rethrows into an inner scope.
+  //
+  // Here we extract those unwind destinations from their (incorrect) parent
+  // exception. Note that the unwind destinations may not be an immediate
+  // children of the parent exception, so we have to traverse the parent chain.
+  const auto *EHInfo = MF.getWasmEHFuncInfo();
+  for (auto &MBB : MF) {
+    if (!MBB.isEHPad())
+      continue;
+    auto *EHPad = &MBB;
+    if (!EHInfo->hasUnwindDest(EHPad))
+      continue;
+    auto *UnwindDest = EHInfo->getUnwindDest(EHPad);
+    auto *WE = getExceptionFor(EHPad);
+    auto *UnwindDestWE = getExceptionFor(UnwindDest);
+    if (WE->contains(UnwindDestWE)) {
+      if (WE->getParentException())
+        UnwindDestWE->setParentException(WE->getParentException());
+      else
+        UnwindDestWE->setParentException(nullptr);
+    }
+  }
+
   // Add BBs to exceptions
   for (auto DomNode : post_order(&MDT)) {
     MachineBasicBlock *MBB = DomNode->getBlock();

diff  --git a/llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.h b/llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.h
index 50151ec8da5a..df18bec85e75 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.h
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.h
@@ -137,7 +137,7 @@ class WebAssemblyExceptionInfo final : public MachineFunctionPass {
 
   bool runOnMachineFunction(MachineFunction &) override;
   void releaseMemory() override;
-  void recalculate(MachineDominatorTree &MDT,
+  void recalculate(MachineFunction &MF, MachineDominatorTree &MDT,
                    const MachineDominanceFrontier &MDF);
   void getAnalysisUsage(AnalysisUsage &AU) const override;
 

diff  --git a/llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll b/llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll
index 722937638b7d..531a93c1e31a 100644
--- a/llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll
+++ b/llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll
@@ -97,40 +97,42 @@ try.cont:                                         ; preds = %catch, %catch2, %en
 
 ; CHECK-LABEL: test1
 ; CHECK: try
-; CHECK:   call      foo
+; CHECK:   call  foo
 ; CHECK: catch
 ; CHECK:   block
 ; CHECK:     block
-; CHECK:       br_if     0, {{.*}}                     # 0: down to label[[L0:[0-9]+]]
-; CHECK:       call      $drop=, __cxa_begin_catch
+; CHECK:       br_if     0, {{.*}}                     # 0: down to label[[L0:[0-9+]]]
+; CHECK:       call  $drop=, __cxa_begin_catch, $0
 ; CHECK:       try
-; CHECK:         call      foo
-; CHECK:         br        2                           # 2: down to label[[L1:[0-9]+]]
-; CHECK:       catch
 ; CHECK:         try
+; CHECK:           call  foo
+; CHECK:           br        3                         # 3: down to label[[L1:[0-9+]]]
+; CHECK:         catch
 ; CHECK:           block
-; CHECK:             br_if     0, {{.*}}               # 0: down to label[[L2:[0-9]+]]
-; CHECK:             call      $drop=, __cxa_begin_catch
-; CHECK:             try
-; CHECK:               call      foo
-; CHECK:               br        2                     # 2: down to label[[L3:[0-9]+]]
-; CHECK:             catch
-; CHECK:               call      __cxa_end_catch
-; CHECK:               rethrow   0                     # down to catch[[C0:[0-9]+]]
-; CHECK:             end_try
-; CHECK:           end_block                           # label[[L2]]:
-; CHECK:           rethrow   1                         # down to catch[[C0]]
-; CHECK:         catch_all                             # catch[[C0]]:
-; CHECK:           call      __cxa_end_catch
-; CHECK:           rethrow   0                         # to caller
-; CHECK:         end_try                               # label[[L3]]:
-; CHECK:         call      __cxa_end_catch
-; CHECK:         br        2                           # 2: down to label[[L1]]
+; CHECK:             block
+; CHECK:               br_if     0, {{.*}}             # 0: down to label[[L2:[0-9+]]]
+; CHECK:               call  $drop=, __cxa_begin_catch
+; CHECK:               try
+; CHECK:                 call  foo
+; CHECK:                 br        2                   # 2: down to label[[L3:[0-9+]]]
+; CHECK:               catch_all
+; CHECK:                 call  __cxa_end_catch
+; CHECK:                 rethrow   0                   # down to catch[[L4:[0-9+]]]
+; CHECK:               end_try
+; CHECK:             end_block                         # label[[L2]]:
+; CHECK:             rethrow   1                       # down to catch[[L4]]
+; CHECK:           end_block                           # label[[L3]]:
+; CHECK:           call  __cxa_end_catch
+; CHECK:           br        3                         # 3: down to label[[L1]]
+; CHECK:         end_try
+; CHECK:       catch_all                               # catch[[L4]]:
+; CHECK:         call  __cxa_end_catch
+; CHECK:         rethrow   0                           # to caller
 ; CHECK:       end_try
 ; CHECK:     end_block                                 # label[[L0]]:
 ; CHECK:     rethrow   1                               # to caller
 ; CHECK:   end_block                                   # label[[L1]]:
-; CHECK:   call      __cxa_end_catch
+; CHECK:   call  __cxa_end_catch
 ; CHECK: end_try
 define void @test1() personality i8* bitcast (i32 (...)* @__gxx_wasm_personality_v0 to i8*) {
 entry:

diff  --git a/llvm/unittests/Target/WebAssembly/WebAssemblyExceptionInfoTest.cpp b/llvm/unittests/Target/WebAssembly/WebAssemblyExceptionInfoTest.cpp
index c63328a278f7..003f8295ab68 100644
--- a/llvm/unittests/Target/WebAssembly/WebAssemblyExceptionInfoTest.cpp
+++ b/llvm/unittests/Target/WebAssembly/WebAssemblyExceptionInfoTest.cpp
@@ -169,7 +169,7 @@ body: |
   MachineDominanceFrontier MDF;
   MDT.runOnMachineFunction(*MF);
   MDF.getBase().analyze(MDT.getBase());
-  WEI.recalculate(MDT, MDF);
+  WEI.recalculate(*MF, MDT, MDF);
 
   // Exception info structure:
   // |- bb2 (ehpad), bb3, bb4, bb5, bb6, bb8, bb9, bb10
@@ -344,7 +344,7 @@ body: |
   MachineDominanceFrontier MDF;
   MDT.runOnMachineFunction(*MF);
   MDF.getBase().analyze(MDT.getBase());
-  WEI.recalculate(MDT, MDF);
+  WEI.recalculate(*MF, MDT, MDF);
 
   // Exception info structure:
   // |- bb1 (ehpad), bb2, bb3, bb4, bb5, bb6, bb7, bb8, bb10, bb11, bb12


        


More information about the llvm-commits mailing list