[llvm] 380fd09 - [WebAssembly] Fix unwind mismatches in new EH (#114361)

Tue Nov 5 09:40:45 PST 2024

Author: Heejin Ahn
Date: 2024-11-05T09:40:41-08:00
New Revision: 380fd09d982eb199e3c79834fc0f6dc92eb90239

URL: https://github.com/llvm/llvm-project/commit/380fd09d982eb199e3c79834fc0f6dc92eb90239
DIFF: https://github.com/llvm/llvm-project/commit/380fd09d982eb199e3c79834fc0f6dc92eb90239.diff

LOG: [WebAssembly] Fix unwind mismatches in new EH (#114361)

This fixes unwind mismatches for the new EH spec.

The main flow is similar to that of the legacy EH's unwind mismatch
fixing. The new EH shared `fixCallUnwindMismatches` and
`fixCatchUnwindMismatches` functions, which gather the range of
instructions we need to fix their unwind destination for, with the
legacy EH. But unlike the legacy EH that uses `try`-`delegate`s to fix
them, the new EH wrap those instructions with nested
`try_table`-`end_try_table`s that jump to a "trampoline" BB, where we
rethrow (using a `throw_ref`) the exception to the correct `try_table`.

For a simple example of a call unwind mismatch, suppose if `call foo`
should unwind to the outer `try_table` but is wrapped in another
`try_table` (not shown here):
```wast
try_table
  ...
  call foo    ;; Unwind mismatch. Should unwind to the outer try_table
  ...
end_try_table
```

Then we wrap the call with a new nested `try_table`-`end_try_table`, add
a `block` / `end_block` right inside the target `try_table`, and make
the nested `try_table` jump to it using a `catch_all_ref` clause, and
rethrow the exception using a `throw_ref`:
```wast
try_table
  block $l0 exnref
    ...
    try_table (catch_all_ref $l0)
      call foo
    end_try_table
    ...
  end_block             ;; Trampoline BB
  throw_ref
end_try_table
```

---

This fixes two existing bugs. These are not easy to test independently
without the unwind mismatch fixing. The first one is how we calculate
`ScopeTops`. Turns out, we should do it in the same way as in the legacy
EH even though there is no `end_try` at the end of `catch` block
anymore. `nested_try` in `cfg-stackify-eh.ll` tests this case.

The second bug is in `rewriteDepthImmediates`. `try_table`'s immediates
should be computed without the `try_table` itself, meaning
```wast
block
  try_table (catch ... 0)
  end_try_table
end_block
```
Here 0 should target not `end_try_table` but `end_block`. This bug
didn't crash the program because `placeTryTableMarker` generated only
the simple form of `try_table` that has a single catch clause and an
`end_block` follows right after the `end_try_table` in the same BB, so
jumping to an `end_try_table` is the same as jumping to the `end_block`.
But now we generate `catch` clauses with depths greater than 0 with when
fixing unwind mismatches, which uncovered this bug.

---

One case that needs a special treatment was when `end_loop` precedes an
`end_try_table` within a BB and this BB is a (true) unwind destination
when fixing unwind mismatches. In this case we need to split this
`end_loop` into a predecessor BB. This case is tested in
`unwind_mismatches_with_loop` in `cfg-stackify-eh.ll`.

---

`cfg-stackify-eh.ll` contains mostly the same set of tests with the
existing `cfg-stackify-eh-legacy.ll` with the updated FileCheck
expectations. As in `cfg-stackify-eh-legacy.ll`, the FileCheck lines
mostly only contain control flow instructions and calls for readability.
- `nested_try` and `unwind_mismatches_with_loop` are added to test newly
found bugs in the new EH.
- Some tests in `cfg-stackify-eh-legacy.ll` about the legacy-EH-specific
asepcts have not been added to `cfg-stackify-eh.ll`.
(`remove_unnecessary_instrs`, `remove_unnecessary_br`,
`fix_function_end_return_type_with_try_catch`, and
`branch_remapping_after_fixing_unwind_mismatches_0/1`)

Added: 
    llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll

Modified: 
    llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h
    llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h b/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h
index e3a60fa4812d8f..3900d4a0aa7044 100644

--- a/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h
+++ b/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h
@@ -478,6 +478,22 @@ inline bool isMarker(unsigned Opc) {
   }
 }
 
+inline bool isEndMarker(unsigned Opc) {
+  switch (Opc) {
+  case WebAssembly::END_BLOCK:
+  case WebAssembly::END_BLOCK_S:
+  case WebAssembly::END_LOOP:
+  case WebAssembly::END_LOOP_S:
+  case WebAssembly::END_TRY:
+  case WebAssembly::END_TRY_S:
+  case WebAssembly::END_TRY_TABLE:
+  case WebAssembly::END_TRY_TABLE_S:
+    return true;
+  default:
+    return false;
+  }
+}
+
 inline bool isTry(unsigned Opc) {
   switch (Opc) {
   case WebAssembly::TRY:
@@ -510,6 +526,20 @@ inline bool isCatch(unsigned Opc) {
   }
 }
 
+inline bool isCatchAll(unsigned Opc) {
+  switch (Opc) {
+  case WebAssembly::CATCH_ALL_LEGACY:
+  case WebAssembly::CATCH_ALL_LEGACY_S:
+  case WebAssembly::CATCH_ALL:
+  case WebAssembly::CATCH_ALL_S:
+  case WebAssembly::CATCH_ALL_REF:
+  case WebAssembly::CATCH_ALL_REF_S:
+    return true;
+  default:
+    return false;
+  }
+}
+
 inline bool isLocalGet(unsigned Opc) {
   switch (Opc) {
   case WebAssembly::LOCAL_GET_I32:

diff  --git a/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp
index 2efab407a85682..6c13beb5feaa88 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp
@@ -78,13 +78,19 @@ class WebAssemblyCFGStackify final : public MachineFunctionPass {
   void placeTryMarker(MachineBasicBlock &MBB);
   void placeTryTableMarker(MachineBasicBlock &MBB);
 
-  // Exception handling related functions
+  // Unwind mismatch fixing for exception handling
+  // - Common functions
   bool fixCallUnwindMismatches(MachineFunction &MF);
   bool fixCatchUnwindMismatches(MachineFunction &MF);
+  void recalculateScopeTops(MachineFunction &MF);
+  // - Legacy EH
   void addNestedTryDelegate(MachineInstr *RangeBegin, MachineInstr *RangeEnd,
                             MachineBasicBlock *UnwindDest);
-  void recalculateScopeTops(MachineFunction &MF);
   void removeUnnecessaryInstrs(MachineFunction &MF);
+  // - Standard EH (exnref)
+  void addNestedTryTable(MachineInstr *RangeBegin, MachineInstr *RangeEnd,
+                         MachineBasicBlock *UnwindDest);
+  MachineBasicBlock *getTrampolineBlock(MachineBasicBlock *UnwindDest);
 
   // Wrap-up
   using EndMarkerInfo =
@@ -111,6 +117,9 @@ class WebAssemblyCFGStackify final : public MachineFunctionPass {
   // <EH pad, TRY marker> map
   DenseMap<const MachineBasicBlock *, MachineInstr *> EHPadToTry;
 
+  DenseMap<const MachineBasicBlock *, MachineBasicBlock *>
+      UnwindDestToTrampoline;
+
   // We need an appendix block to place 'end_loop' or 'end_try' marker when the
   // loop / exception bottom block is the last block in a function
   MachineBasicBlock *AppendixBB = nullptr;
@@ -119,11 +128,27 @@ class WebAssemblyCFGStackify final : public MachineFunctionPass {
       AppendixBB = MF.CreateMachineBasicBlock();
       // Give it a fake predecessor so that AsmPrinter prints its label.
       AppendixBB->addSuccessor(AppendixBB);
-      MF.push_back(AppendixBB);
+      // If the caller trampoline BB exists, insert the appendix BB before it.
+      // Otherwise insert it at the end of the function.
+      if (CallerTrampolineBB)
+        MF.insert(CallerTrampolineBB->getIterator(), AppendixBB);
+      else
+        MF.push_back(AppendixBB);
     }
     return AppendixBB;
   }
 
+  // Create a caller-dedicated trampoline BB to be used for fixing unwind
+  // mismatches where the unwind destination is the caller.
+  MachineBasicBlock *CallerTrampolineBB = nullptr;
+  MachineBasicBlock *getCallerTrampolineBlock(MachineFunction &MF) {
+    if (!CallerTrampolineBB) {
+      CallerTrampolineBB = MF.CreateMachineBasicBlock();
+      MF.push_back(CallerTrampolineBB);
+    }
+    return CallerTrampolineBB;
+  }
+
   // Before running rewriteDepthImmediates function, 'delegate' has a BB as its
   // destination operand. getFakeCallerBlock() returns a fake BB that will be
   // used for the operand when 'delegate' needs to rethrow to the caller. This
@@ -691,12 +716,20 @@ void WebAssemblyCFGStackify::placeTryTableMarker(MachineBasicBlock &MBB) {
   if (!Header)
     return;
 
-  assert(&MBB != &MF.front() && "Header blocks shouldn't have predecessors");
-  MachineBasicBlock *LayoutPred = MBB.getPrevNode();
+  // Unlike the end_try marker, we don't place an end marker at the end of
+  // exception bottom, i.e., at the end of the old 'catch' block. But we still
+  // consider the try-catch part as a scope when computing ScopeTops.
+  WebAssemblyException *WE = WEI.getExceptionFor(&MBB);
+  assert(WE);
+  MachineBasicBlock *Bottom = SRI.getBottom(WE);
+  auto Iter = std::next(Bottom->getIterator());
+  if (Iter == MF.end())
+    Iter--;
+  MachineBasicBlock *Cont = &*Iter;
 
   // If the nearest common dominator is inside a more deeply nested context,
   // walk out to the nearest scope which isn't more deeply nested.
-  for (MachineFunction::iterator I(LayoutPred), E(Header); I != E; --I) {
+  for (MachineFunction::iterator I(Bottom), E(Header); I != E; --I) {
     if (MachineBasicBlock *ScopeTop = ScopeTops[I->getNumber()]) {
       if (ScopeTop->getNumber() > Header->getNumber()) {
         // Skip over an intervening scope.
@@ -905,14 +938,52 @@ void WebAssemblyCFGStackify::placeTryTableMarker(MachineBasicBlock &MBB) {
       BuildMI(MBB, InsertPos, MBB.findPrevDebugLoc(InsertPos),
               TII.get(WebAssembly::END_BLOCK));
   registerScope(Block, EndBlock);
+
   // Track the farthest-spanning scope that ends at this point.
-  updateScopeTops(Header, &MBB);
+  // Unlike the end_try, even if we don't put a end marker at the end of catch
+  // block, we still have to create two mappings: (BB with 'end_try_table' -> BB
+  // with 'try_table') and (BB after the (conceptual) catch block -> BB with
+  // 'try_table').
+  //
+  // This is what can happen if we don't create the latter mapping:
+  //
+  // Suppoe in the legacy EH we have this code:
+  // try
+  //   try
+  //     code1
+  //   catch (a)
+  //   end_try
+  //   code2
+  // catch (b)
+  // end_try
+  //
+  // If we don't create the latter mapping, try_table markers would be placed
+  // like this:
+  // try_table
+  //   code1
+  // end_try_table (a)
+  // try_table
+  //   code2
+  // end_try_table (b)
+  //
+  // This does not reflect the original structure, and more important problem
+  // is, in case 'code1' has an unwind mismatch and should unwind to
+  // 'end_try_table (b)' rather than 'end_try_table (a)', we don't have a way to
+  // make it jump after 'end_try_table (b)' without creating another block. So
+  // even if we don't place 'end_try' marker at the end of 'catch' block
+  // anymore, we create ScopeTops mapping the same way as the legacy exception,
+  // so the resulting code will look like:
+  // try_table
+  //   try_table
+  //     code1
+  //   end_try_table (a)
+  //   code2
+  // end_try_table (b)
+  for (auto *End : {&MBB, Cont})
+    updateScopeTops(Header, End);
 }
 
 void WebAssemblyCFGStackify::removeUnnecessaryInstrs(MachineFunction &MF) {
-  if (WebAssembly::WasmEnableExnref)
-    return;
-
   const auto &TII = *MF.getSubtarget<WebAssemblySubtarget>().getInstrInfo();
 
   // When there is an unconditional branch right before a catch instruction and
@@ -1216,7 +1287,291 @@ void WebAssemblyCFGStackify::addNestedTryDelegate(
   registerTryScope(Try, Delegate, nullptr);
 }
 
+// Given an unwind destination, return a trampoline BB. A trampoline BB is a
+// destination of a nested try_table inserted to fix an unwind mismatch. It
+// contains an end_block, which is the target of the try_table, and a throw_ref,
+// to rethrow the exception to the right try_table.
+// try_table (catch ... )
+//   block exnref
+//     ...
+//     try_table (catch_all_ref N)
+//       some code
+//     end_try_table
+//     ...
+//   end_block                      ;; Trampoline BB
+//   throw_ref
+// end_try_table
+MachineBasicBlock *
+WebAssemblyCFGStackify::getTrampolineBlock(MachineBasicBlock *UnwindDest) {
+  // We need one trampoline BB per unwind destination, even though there are
+  // multiple try_tables target the same unwind destination. If we have already
+  // created one for the given UnwindDest, return it.
+  auto It = UnwindDestToTrampoline.find(UnwindDest);
+  if (It != UnwindDestToTrampoline.end())
+    return It->second;
+
+  auto &MF = *UnwindDest->getParent();
+  auto &MRI = MF.getRegInfo();
+  const auto &TII = *MF.getSubtarget<WebAssemblySubtarget>().getInstrInfo();
+
+  MachineInstr *Block = nullptr;
+  MachineBasicBlock *TrampolineBB = nullptr;
+  DebugLoc EndDebugLoc;
+
+  if (UnwindDest == getFakeCallerBlock(MF)) {
+    // If the unwind destination is the caller, create a caller-dedicated
+    // trampoline BB at the end of the function and wrap the whole function with
+    // a block.
+    auto BeginPos = MF.begin()->begin();
+    while (WebAssembly::isArgument(BeginPos->getOpcode()))
+      BeginPos++;
+    Block = BuildMI(*MF.begin(), BeginPos, MF.begin()->begin()->getDebugLoc(),
+                    TII.get(WebAssembly::BLOCK))
+                .addImm(int64_t(WebAssembly::BlockType::Exnref));
+    TrampolineBB = getCallerTrampolineBlock(MF);
+    MachineBasicBlock *PrevBB = &*std::prev(CallerTrampolineBB->getIterator());
+    EndDebugLoc = PrevBB->findPrevDebugLoc(PrevBB->end());
+  } else {
+    // If the unwind destination is another EH pad, create a trampoline BB for
+    // the unwind dest and insert a block instruction right after the target
+    // try_table.
+    auto *TargetBeginTry = EHPadToTry[UnwindDest];
+    auto *TargetEndTry = BeginToEnd[TargetBeginTry];
+    auto *TargetBeginBB = TargetBeginTry->getParent();
+    auto *TargetEndBB = TargetEndTry->getParent();
+
+    Block = BuildMI(*TargetBeginBB, std::next(TargetBeginTry->getIterator()),
+                    TargetBeginTry->getDebugLoc(), TII.get(WebAssembly::BLOCK))
+                .addImm(int64_t(WebAssembly::BlockType::Exnref));
+    TrampolineBB = MF.CreateMachineBasicBlock();
+    EndDebugLoc = TargetEndTry->getDebugLoc();
+    MF.insert(TargetEndBB->getIterator(), TrampolineBB);
+    TrampolineBB->addSuccessor(UnwindDest);
+  }
+
+  // Insert an end_block, catch_all_ref (pseudo instruction), and throw_ref
+  // instructions in the trampoline BB.
+  MachineInstr *EndBlock =
+      BuildMI(TrampolineBB, EndDebugLoc, TII.get(WebAssembly::END_BLOCK));
+  auto ExnReg = MRI.createVirtualRegister(&WebAssembly::EXNREFRegClass);
+  BuildMI(TrampolineBB, EndDebugLoc, TII.get(WebAssembly::CATCH_ALL_REF))
+      .addDef(ExnReg);
+  BuildMI(TrampolineBB, EndDebugLoc, TII.get(WebAssembly::THROW_REF))
+      .addReg(ExnReg);
+
+  registerScope(Block, EndBlock);
+  UnwindDestToTrampoline[UnwindDest] = TrampolineBB;
+  return TrampolineBB;
+}
+
+// Wrap the given range of instructions with a try_table-end_try_table that
+// targets 'UnwindDest'. RangeBegin and RangeEnd are inclusive.
+void WebAssemblyCFGStackify::addNestedTryTable(MachineInstr *RangeBegin,
+                                               MachineInstr *RangeEnd,
+                                               MachineBasicBlock *UnwindDest) {
+  auto *BeginBB = RangeBegin->getParent();
+  auto *EndBB = RangeEnd->getParent();
+
+  MachineFunction &MF = *BeginBB->getParent();
+  const auto &MFI = *MF.getInfo<WebAssemblyFunctionInfo>();
+  const auto &TII = *MF.getSubtarget<WebAssemblySubtarget>().getInstrInfo();
+
+  // Get the trampoline BB that the new try_table will unwind to.
+  auto *TrampolineBB = getTrampolineBlock(UnwindDest);
+
+  // Local expression tree before the first call of this range should go
+  // after the nested TRY_TABLE.
+  SmallPtrSet<const MachineInstr *, 4> AfterSet;
+  AfterSet.insert(RangeBegin);
+  for (auto I = MachineBasicBlock::iterator(RangeBegin), E = BeginBB->begin();
+       I != E; --I) {
+    if (std::prev(I)->isDebugInstr() || std::prev(I)->isPosition())
+      continue;
+    if (WebAssembly::isChild(*std::prev(I), MFI))
+      AfterSet.insert(&*std::prev(I));
+    else
+      break;
+  }
+
+  // Create the nested try_table instruction.
+  auto TryTablePos = getLatestInsertPos(
+      BeginBB, SmallPtrSet<const MachineInstr *, 4>(), AfterSet);
+  MachineInstr *TryTable =
+      BuildMI(*BeginBB, TryTablePos, RangeBegin->getDebugLoc(),
+              TII.get(WebAssembly::TRY_TABLE))
+          .addImm(int64_t(WebAssembly::BlockType::Void))
+          .addImm(1) // # of catch clauses
+          .addImm(wasm::WASM_OPCODE_CATCH_ALL_REF)
+          .addMBB(TrampolineBB);
+
+  // Create a BB to insert the 'end_try_table' instruction.
+  MachineBasicBlock *EndTryTableBB = MF.CreateMachineBasicBlock();
+  EndTryTableBB->addSuccessor(TrampolineBB);
+
+  auto SplitPos = std::next(RangeEnd->getIterator());
+  if (SplitPos == EndBB->end()) {
+    // If the range's end instruction is at the end of the BB, insert the new
+    // end_try_table BB after the current BB.
+    MF.insert(std::next(EndBB->getIterator()), EndTryTableBB);
+    EndBB->addSuccessor(EndTryTableBB);
+
+  } else {
+    // When the split pos is in the middle of a BB, we split the BB into two and
+    // put the 'end_try_table' BB in between. We normally create a split BB and
+    // make it a successor of the original BB (CatchAfterSplit == false), but in
+    // case the BB is an EH pad and there is a 'catch' after split pos
+    // (CatchAfterSplit == true), we should preserve the BB's property,
+    // including that it is an EH pad, in the later part of the BB, where the
+    // 'catch' is.
+    bool CatchAfterSplit = false;
+    if (EndBB->isEHPad()) {
+      for (auto I = MachineBasicBlock::iterator(SplitPos), E = EndBB->end();
+           I != E; ++I) {
+        if (WebAssembly::isCatch(I->getOpcode())) {
+          CatchAfterSplit = true;
+          break;
+        }
+      }
+    }
+
+    MachineBasicBlock *PreBB = nullptr, *PostBB = nullptr;
+    if (!CatchAfterSplit) {
+      // If the range's end instruction is in the middle of the BB, we split the
+      // BB into two and insert the end_try_table BB in between.
+      // - Before:
+      // bb:
+      //   range_end
+      //   other_insts
+      //
+      // - After:
+      // pre_bb: (previous 'bb')
+      //   range_end
+      // end_try_table_bb: (new)
+      //   end_try_table
+      // post_bb: (new)
+      //   other_insts
+      PreBB = EndBB;
+      PostBB = MF.CreateMachineBasicBlock();
+      MF.insert(std::next(PreBB->getIterator()), PostBB);
+      MF.insert(std::next(PreBB->getIterator()), EndTryTableBB);
+      PostBB->splice(PostBB->end(), PreBB, SplitPos, PreBB->end());
+      PostBB->transferSuccessors(PreBB);
+    } else {
+      // - Before:
+      // ehpad:
+      //   range_end
+      //   catch
+      //   ...
+      //
+      // - After:
+      // pre_bb: (new)
+      //   range_end
+      // end_try_table: (new)
+      //   end_try_table
+      // post_bb: (previous 'ehpad')
+      //   catch
+      //   ...
+      assert(EndBB->isEHPad());
+      PreBB = MF.CreateMachineBasicBlock();
+      PostBB = EndBB;
+      MF.insert(PostBB->getIterator(), PreBB);
+      MF.insert(PostBB->getIterator(), EndTryTableBB);
+      PreBB->splice(PreBB->end(), PostBB, PostBB->begin(), SplitPos);
+      // We don't need to transfer predecessors of the EH pad to 'PreBB',
+      // because an EH pad's predecessors are all through unwind edges and they
+      // should still unwind to the EH pad, not PreBB.
+    }
+    unstackifyVRegsUsedInSplitBB(*PreBB, *PostBB);
+    PreBB->addSuccessor(EndTryTableBB);
+    PreBB->addSuccessor(PostBB);
+  }
+
+  // Add a 'end_try_table' instruction in the EndTryTable BB created above.
+  MachineInstr *EndTryTable = BuildMI(EndTryTableBB, RangeEnd->getDebugLoc(),
+                                      TII.get(WebAssembly::END_TRY_TABLE));
+  registerTryScope(TryTable, EndTryTable, nullptr);
+}
+
+// In the standard (exnref) EH, we fix unwind mismatches by adding a new
+// block~end_block inside of the unwind destination try_table~end_try_table:
+// try_table ...
+//   block exnref                   ;; (new)
+//     ...
+//     try_table (catch_all_ref N)  ;; (new) to trampoline BB
+//       code
+//     end_try_table                ;; (new)
+//     ...
+//   end_block                      ;; (new) trampoline BB
+//   throw_ref                      ;; (new)
+// end_try_table
+//
+// To do this, we will create a new BB that will contain the new 'end_block' and
+// 'throw_ref' and insert it before the 'end_try_table' BB.
+//
+// But there are cases when there are 'end_loop'(s) before the 'end_try_table'
+// in the same BB. (There can't be 'end_block' before 'end_try_table' in the
+// same BB because EH pads can't be directly branched to.) Then after fixing
+// unwind mismatches this will create the mismatching markers like below:
+// bb0:
+//   try_table
+//   block exnref
+//   ...
+//   loop
+//   ...
+// new_bb:
+//   end_block
+// end_try_table_bb:
+//   end_loop
+//   end_try_table
+//
+// So if the unwind dest BB has a end_loop before an end_try_table, we split the
+// BB with the end_loop as a separate BB before the end_try_table BB, so that
+// after we fix the unwind mismatch, the code will be like:
+// bb0:
+//   try_table
+//   block exnref
+//   ...
+//   loop
+//   ...
+// end_loop_bb:
+//   end_loop
+// new_bb:
+//   end_block
+// end_try_table_bb:
+//   end_try_table
+static void splitEndLoopBB(MachineBasicBlock *UnwindDest) {
+  auto &MF = *UnwindDest->getParent();
+  MachineInstr *EndTryTable = nullptr, *EndLoop = nullptr;
+  for (auto &MI : reverse(*UnwindDest)) {
+    if (MI.getOpcode() == WebAssembly::END_TRY_TABLE) {
+      EndTryTable = &MI;
+      continue;
+    }
+    if (EndTryTable && MI.getOpcode() == WebAssembly::END_LOOP) {
+      EndLoop = &MI;
+      break;
+    }
+  }
+  if (!EndLoop)
+    return;
+
+  auto *EndLoopBB = MF.CreateMachineBasicBlock();
+  MF.insert(UnwindDest->getIterator(), EndLoopBB);
+  auto SplitPos = std::next(EndLoop->getIterator());
+  EndLoopBB->splice(EndLoopBB->end(), UnwindDest, UnwindDest->begin(),
+                    SplitPos);
+  EndLoopBB->addSuccessor(UnwindDest);
+}
+
 bool WebAssemblyCFGStackify::fixCallUnwindMismatches(MachineFunction &MF) {
+  // This function is used for both the legacy EH and the standard (exnref) EH,
+  // and the reason we have unwind mismatches is the same for the both of them,
+  // but the code examples in the comments are going to be 
diff erent. To make
+  // the description less confusing, we write the basically same comments twice,
+  // once for the legacy EH and the standard EH.
+  //
+  // -- Legacy EH --------------------------------------------------------------
+  //
   // Linearizing the control flow by placing TRY / END_TRY markers can create
   // mismatches in unwind destinations for throwing instructions, such as calls.
   //
@@ -1335,12 +1690,128 @@ bool WebAssemblyCFGStackify::fixCallUnwindMismatches(MachineFunction &MF) {
   // couldn't happen, because may-throwing instruction there had an unwind
   // destination, i.e., it was an invoke before, and there could be only one
   // invoke within a BB.)
+  //
+  // -- Standard EH ------------------------------------------------------------
+  //
+  // Linearizing the control flow by placing TRY / END_TRY_TABLE markers can
+  // create mismatches in unwind destinations for throwing instructions, such as
+  // calls.
+  //
+  // We use the a nested 'try_table'~'end_try_table' instruction to fix the
+  // unwind mismatches. try_table's catch clauses take an immediate argument
+  // that specifics which block we should branch to.
+  //
+  // 1. When an instruction may throw, but the EH pad it will unwind to can be
+  //    
diff erent from the original CFG.
+  //
+  // Example: we have the following CFG:
+  // bb0:
+  //   call @foo    ; if it throws, unwind to bb2
+  // bb1:
+  //   call @bar    ; if it throws, unwind to bb3
+  // bb2 (ehpad):
+  //   catch
+  //   ...
+  // bb3 (ehpad)
+  //   catch
+  //   ...
+  //
+  // And the CFG is sorted in this order. Then after placing TRY_TABLE markers
+  // (and BLOCK markers for the TRY_TABLE's destinations), it will look like:
+  // (BB markers are omitted)
+  // block
+  //   try_table (catch ... 0)
+  //     block
+  //       try_table (catch ... 0)
+  //         call @foo
+  //         call @bar              ;; if it throws, unwind to bb3
+  //       end_try_table
+  //     end_block                  ;; ehpad (bb2)
+  //     ...
+  //   end_try_table
+  // end_block                      ;; ehpad (bb3)
+  // ...
+  //
+  // Now if bar() throws, it is going to end up in bb2, not bb3, where it is
+  // supposed to end up. We solve this problem by wrapping the mismatching call
+  // with an inner try_table~end_try_table that sends the exception to the the
+  // 'trampoline' block, which rethrows, or 'bounces' it to the right
+  // end_try_table:
+  // block
+  //   try_table (catch ... 0)
+  //     block exnref                       ;; (new)
+  //       block
+  //         try_table (catch ... 0)
+  //           call @foo
+  //           try_table (catch_all_ref 2)  ;; (new) to trampoline BB
+  //             call @bar
+  //           end_try_table                ;; (new)
+  //         end_try_table
+  //       end_block                        ;; ehpad (bb2)
+  //       ...
+  //     end_block                          ;; (new) trampoline BB
+  //     throw_ref                          ;; (new)
+  //   end_try_table
+  // end_block                              ;; ehpad (bb3)
+  //
+  // ---
+  // 2. The same as 1, but in this case an instruction unwinds to a caller
+  //    function and not another EH pad.
+  //
+  // Example: we have the following CFG:
+  // bb0:
+  //   call @foo       ; if it throws, unwind to bb2
+  // bb1:
+  //   call @bar       ; if it throws, unwind to caller
+  // bb2 (ehpad):
+  //   catch
+  //   ...
+  //
+  // And the CFG is sorted in this order. Then after placing TRY_TABLE markers
+  // (and BLOCK markers for the TRY_TABLE's destinations), it will look like:
+  // block
+  //   try_table (catch ... 0)
+  //     call @foo
+  //     call @bar              ;; if it throws, unwind to caller
+  //   end_try_table
+  // end_block                  ;; ehpad (bb2)
+  // ...
+  //
+  // Now if bar() throws, it is going to end up in bb2, when it is supposed
+  // throw up to the caller. We solve this problem in the same way, but in this
+  // case 'delegate's immediate argument is the number of block depths + 1,
+  // which means it rethrows to the caller.
+  // block exnref                       ;; (new)
+  //   block
+  //     try_table (catch ... 0)
+  //       call @foo
+  //       try_table (catch_all_ref 2)  ;; (new) to trampoline BB
+  //         call @bar
+  //       end_try_table                ;; (new)
+  //     end_try_table
+  //   end_block                        ;; ehpad (bb2)
+  //   ...
+  // end_block                          ;; (new) caller trampoline BB
+  // throw_ref                          ;; (new) throw to the caller
+  //
+  // Before rewriteDepthImmediates, try_table's catch clauses' argument is a
+  // trampoline BB from which we throw_ref the exception to the right
+  // end_try_table. In case of the caller, it will take a new caller-dedicated
+  // trampoline BB generated by getCallerTrampolineBlock(), which throws the
+  // exception to the caller.
+  //
+  // In case there are multiple calls in a BB that may throw to the caller, they
+  // can be wrapped together in one nested try_table-end_try_table scope. (In 1,
+  // this couldn't happen, because may-throwing instruction there had an unwind
+  // destination, i.e., it was an invoke before, and there could be only one
+  // invoke within a BB.)
 
   SmallVector<const MachineBasicBlock *, 8> EHPadStack;
-  // Range of intructions to be wrapped in a new nested try~delegate. A range
-  // exists in a single BB and does not span multiple BBs.
+  // Range of intructions to be wrapped in a new nested try~delegate or
+  // try_table~end_try_table. A range exists in a single BB and does not span
+  // multiple BBs.
   using TryRange = std::pair<MachineInstr *, MachineInstr *>;
-  // In original CFG, <unwind destination BB, a vector of try ranges>
+  // In original CFG, <unwind destination BB, a vector of try/try_table ranges>
   DenseMap<MachineBasicBlock *, SmallVector<TryRange, 4>> UnwindDestToTryRanges;
 
   // Gather possibly throwing calls (i.e., previously invokes) whose current
@@ -1349,7 +1820,7 @@ bool WebAssemblyCFGStackify::fixCallUnwindMismatches(MachineFunction &MF) {
   for (auto &MBB : reverse(MF)) {
     bool SeenThrowableInstInBB = false;
     for (auto &MI : reverse(MBB)) {
-      if (MI.getOpcode() == WebAssembly::TRY)
+      if (WebAssembly::isTry(MI.getOpcode()))
         EHPadStack.pop_back();
       else if (WebAssembly::isCatch(MI.getOpcode()))
         EHPadStack.push_back(MI.getParent());
@@ -1454,7 +1925,7 @@ bool WebAssemblyCFGStackify::fixCallUnwindMismatches(MachineFunction &MF) {
       }
 
       // Update EHPadStack.
-      if (MI.getOpcode() == WebAssembly::TRY)
+      if (WebAssembly::isTry(MI.getOpcode()))
         EHPadStack.pop_back();
       else if (WebAssembly::isCatch(MI.getOpcode()))
         EHPadStack.push_back(MI.getParent());
@@ -1470,6 +1941,12 @@ bool WebAssemblyCFGStackify::fixCallUnwindMismatches(MachineFunction &MF) {
   if (UnwindDestToTryRanges.empty())
     return false;
 
+  // When end_loop is before end_try_table within the same BB in unwind
+  // destinations, we should split the end_loop into another BB.
+  if (WebAssembly::WasmEnableExnref)
+    for (auto &[UnwindDest, _] : UnwindDestToTryRanges)
+      splitEndLoopBB(UnwindDest);
+
   // Now we fix the mismatches by wrapping calls with inner try-delegates.
   for (auto &P : UnwindDestToTryRanges) {
     NumCallUnwindMismatches += P.second.size();
@@ -1483,9 +1960,10 @@ bool WebAssemblyCFGStackify::fixCallUnwindMismatches(MachineFunction &MF) {
 
       // If this BB has an EH pad successor, i.e., ends with an 'invoke', and if
       // the current range contains the invoke, now we are going to wrap the
-      // invoke with try-delegate, making the 'delegate' BB the new successor
-      // instead, so remove the EH pad succesor here. The BB may not have an EH
-      // pad successor if calls in this BB throw to the caller.
+      // invoke with try-delegate or try_table-end_try_table, making the
+      // 'delegate' or 'end_try_table' BB the new successor instead, so remove
+      // the EH pad succesor here. The BB may not have an EH pad successor if
+      // calls in this BB throw to the caller.
       if (UnwindDest != getFakeCallerBlock(MF)) {
         MachineBasicBlock *EHPad = nullptr;
         for (auto *Succ : MBB->successors()) {
@@ -1498,14 +1976,43 @@ bool WebAssemblyCFGStackify::fixCallUnwindMismatches(MachineFunction &MF) {
           MBB->removeSuccessor(EHPad);
       }
 
-      addNestedTryDelegate(RangeBegin, RangeEnd, UnwindDest);
+      if (WebAssembly::WasmEnableExnref)
+        addNestedTryTable(RangeBegin, RangeEnd, UnwindDest);
+      else
+        addNestedTryDelegate(RangeBegin, RangeEnd, UnwindDest);
     }
   }
 
   return true;
 }
 
+// Returns the single destination of try_table, if there is one. All try_table
+// we generate in this pass has a single destination, i.e., a single catch
+// clause.
+static MachineBasicBlock *getSingleUnwindDest(const MachineInstr *TryTable) {
+  if (TryTable->getOperand(1).getImm() != 1)
+    return nullptr;
+  switch (TryTable->getOperand(2).getImm()) {
+  case wasm::WASM_OPCODE_CATCH:
+  case wasm::WASM_OPCODE_CATCH_REF:
+    return TryTable->getOperand(4).getMBB();
+  case wasm::WASM_OPCODE_CATCH_ALL:
+  case wasm::WASM_OPCODE_CATCH_ALL_REF:
+    return TryTable->getOperand(3).getMBB();
+  default:
+    llvm_unreachable("try_table: Invalid catch clause\n");
+  }
+}
+
 bool WebAssemblyCFGStackify::fixCatchUnwindMismatches(MachineFunction &MF) {
+  // This function is used for both the legacy EH and the standard (exnref) EH,
+  // and the reason we have unwind mismatches is the same for the both of them,
+  // but the code examples in the comments are going to be 
diff erent. To make
+  // the description less confusing, we write the basically same comments twice,
+  // once for the legacy EH and the standard EH.
+  //
+  // -- Legacy EH --------------------------------------------------------------
+  //
   // There is another kind of unwind destination mismatches besides call unwind
   // mismatches, which we will call "catch unwind mismatches". See this example
   // after the marker placement:
@@ -1543,6 +2050,60 @@ bool WebAssemblyCFGStackify::fixCatchUnwindMismatches(MachineFunction &MF) {
   // catch_all                 ;; ehpad B
   //   ...
   // end_try
+  //
+  // The right destination may be another EH pad or the caller. (The example
+  // here shows the case it is the caller.)
+  //
+  // -- Standard EH ------------------------------------------------------------
+  //
+  // There is another kind of unwind destination mismatches besides call unwind
+  // mismatches, which we will call "catch unwind mismatches". See this example
+  // after the marker placement:
+  // block
+  //   try_table (catch_all_ref 0)
+  //     block
+  //       try_table (catch ... 0)
+  //         call @foo
+  //       end_try_table
+  //     end_block                  ;; ehpad A (next unwind dest: caller)
+  //     ...
+  //   end_try_table
+  // end_block                      ;; ehpad B
+  // ...
+  //
+  // 'call @foo's unwind destination is the ehpad A. But suppose 'call @foo'
+  // throws a foreign exception that is not caught by ehpad A, and its next
+  // destination should be the caller. But after control flow linearization,
+  // another EH pad can be placed in between (e.g. ehpad B here), making the
+  // next unwind destination incorrect. In this case, the foreign exception will
+  // instead go to ehpad B and will be caught there instead. In this example the
+  // correct next unwind destination is the caller, but it can be another outer
+  // catch in other cases.
+  //
+  // There is no specific 'call' or 'throw' instruction to wrap with an inner
+  // try_table-end_try_table, so we wrap the whole try_table-end_try_table with
+  // an inner try_table-end_try_table that sends the exception to a trampoline
+  // BB. We rethrow the sent exception using a throw_ref to the right
+  // destination, which is the caller in the example below:
+  // block exnref
+  //   block
+  //     try_table (catch_all_ref 0)
+  //       try_table (catch_all_ref 2)  ;; (new) to trampoline
+  //         block
+  //           try_table (catch ... 0)
+  //             call @foo
+  //           end_try_table
+  //         end_block                  ;; ehpad A (next unwind dest: caller)
+  //       end_try_table                ;; (new)
+  //       ...
+  //     end_try_table
+  //   end_block                        ;; ehpad B
+  //   ...
+  // end_block                          ;; (new) caller trampoline BB
+  // throw_ref                          ;; (new) throw to the caller
+  //
+  // The right destination may be another EH pad or the caller. (The example
+  // here shows the case it is the caller.)
 
   const auto *EHInfo = MF.getWasmEHFuncInfo();
   assert(EHInfo);
@@ -1555,14 +2116,26 @@ bool WebAssemblyCFGStackify::fixCatchUnwindMismatches(MachineFunction &MF) {
     for (auto &MI : reverse(MBB)) {
       if (MI.getOpcode() == WebAssembly::TRY)
         EHPadStack.pop_back();
-      else if (MI.getOpcode() == WebAssembly::DELEGATE)
+      else if (MI.getOpcode() == WebAssembly::TRY_TABLE) {
+        // We want to exclude try_tables created in fixCallUnwindMismatches.
+        // Check if the try_table's unwind destination matches the EH pad stack
+        // top. If it is created in fixCallUnwindMismatches, it wouldn't.
+        if (getSingleUnwindDest(&MI) == EHPadStack.back())
+          EHPadStack.pop_back();
+      } else if (MI.getOpcode() == WebAssembly::DELEGATE)
         EHPadStack.push_back(&MBB);
       else if (WebAssembly::isCatch(MI.getOpcode())) {
         auto *EHPad = &MBB;
 
+        // If the BB has a catch pseudo instruction but is not marked as an EH
+        // pad, it's a trampoline BB we created in fixCallUnwindMismatches. Skip
+        // it.
+        if (!EHPad->isEHPad())
+          continue;
+
         // catch_all always catches an exception, so we don't need to do
         // anything
-        if (MI.getOpcode() == WebAssembly::CATCH_ALL_LEGACY) {
+        if (WebAssembly::isCatchAll(MI.getOpcode())) {
         }
 
         // This can happen when the unwind dest was removed during the
@@ -1604,16 +2177,29 @@ bool WebAssemblyCFGStackify::fixCatchUnwindMismatches(MachineFunction &MF) {
   assert(EHPadStack.empty());
   if (EHPadToUnwindDest.empty())
     return false;
+
+  // When end_loop is before end_try_table within the same BB in unwind
+  // destinations, we should split the end_loop into another BB.
+  for (auto &[_, UnwindDest] : EHPadToUnwindDest)
+    splitEndLoopBB(UnwindDest);
+
   NumCatchUnwindMismatches += EHPadToUnwindDest.size();
   SmallPtrSet<MachineBasicBlock *, 4> NewEndTryBBs;
 
   for (auto &[EHPad, UnwindDest] : EHPadToUnwindDest) {
     MachineInstr *Try = EHPadToTry[EHPad];
     MachineInstr *EndTry = BeginToEnd[Try];
-    addNestedTryDelegate(Try, EndTry, UnwindDest);
-    NewEndTryBBs.insert(EndTry->getParent());
+    if (WebAssembly::WasmEnableExnref) {
+      addNestedTryTable(Try, EndTry, UnwindDest);
+    } else {
+      addNestedTryDelegate(Try, EndTry, UnwindDest);
+      NewEndTryBBs.insert(EndTry->getParent());
+    }
   }
 
+  if (WebAssembly::WasmEnableExnref)
+    return true;
+
   // Adding a try-delegate wrapping an existing try-catch-end can make existing
   // branch destination BBs invalid. For example,
   //
@@ -1813,11 +2399,6 @@ void WebAssemblyCFGStackify::placeMarkers(MachineFunction &MF) {
     }
   }
 
-  // FIXME We return here temporarily until we implement fixing unwind
-  // mismatches for the new exnref proposal.
-  if (WebAssembly::WasmEnableExnref)
-    return;
-
   // Fix mismatches in unwind destinations induced by linearizing the code.
   if (MCAI->getExceptionHandlingType() == ExceptionHandling::Wasm &&
       MF.getFunction().hasPersonalityFn()) {
@@ -1937,9 +2518,6 @@ void WebAssemblyCFGStackify::rewriteDepthImmediates(MachineFunction &MF) {
   for (auto &MBB : reverse(MF)) {
     for (MachineInstr &MI : llvm::reverse(MBB)) {
       switch (MI.getOpcode()) {
-      case WebAssembly::TRY_TABLE:
-        RewriteOperands(MI);
-        [[fallthrough]];
       case WebAssembly::BLOCK:
       case WebAssembly::TRY:
         assert(ScopeTops[Stack.back().first->getNumber()]->getNumber() <=
@@ -1948,6 +2526,14 @@ void WebAssemblyCFGStackify::rewriteDepthImmediates(MachineFunction &MF) {
         Stack.pop_back();
         break;
 
+      case WebAssembly::TRY_TABLE:
+        assert(ScopeTops[Stack.back().first->getNumber()]->getNumber() <=
+                   MBB.getNumber() &&
+               "Block/try/try_table marker should be balanced");
+        Stack.pop_back();
+        RewriteOperands(MI);
+        break;
+
       case WebAssembly::LOOP:
         assert(Stack.back().first == &MBB && "Loop top should be balanced");
         Stack.pop_back();
@@ -1994,7 +2580,7 @@ void WebAssemblyCFGStackify::rewriteDepthImmediates(MachineFunction &MF) {
 void WebAssemblyCFGStackify::cleanupFunctionData(MachineFunction &MF) {
   if (FakeCallerBB)
     MF.deleteMachineBasicBlock(FakeCallerBB);
-  AppendixBB = FakeCallerBB = nullptr;
+  AppendixBB = FakeCallerBB = CallerTrampolineBB = nullptr;
 }
 
 void WebAssemblyCFGStackify::releaseMemory() {
@@ -2003,6 +2589,7 @@ void WebAssemblyCFGStackify::releaseMemory() {
   EndToBegin.clear();
   TryToEHPad.clear();
   EHPadToTry.clear();
+  UnwindDestToTrampoline.clear();
 }
 
 bool WebAssemblyCFGStackify::runOnMachineFunction(MachineFunction &MF) {
@@ -2023,7 +2610,7 @@ bool WebAssemblyCFGStackify::runOnMachineFunction(MachineFunction &MF) {
 
   // Remove unnecessary instructions possibly introduced by try/end_trys.
   if (MCAI->getExceptionHandlingType() == ExceptionHandling::Wasm &&
-      MF.getFunction().hasPersonalityFn())
+      MF.getFunction().hasPersonalityFn() && !WebAssembly::WasmEnableExnref)
     removeUnnecessaryInstrs(MF);
 
   // Convert MBB operands in terminators to relative depth immediates.

diff  --git a/llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll b/llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll
new file mode 100644
index 00000000000000..6df626df08883f
--- /dev/null
+++ b/llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll
@@ -0,0 +1,1555 @@
+; REQUIRES: asserts
+; RUN: llc < %s -disable-wasm-fallthrough-return-opt -disable-block-placement -verify-machineinstrs -fast-isel=false -machine-sink-split-probability-threshold=0 -cgp-freq-ratio-to-skip-merge=1000 -wasm-enable-eh -wasm-enable-exnref -exception-model=wasm -mattr=+exception-handling,bulk-memory | FileCheck %s
+; RUN: llc < %s -disable-wasm-fallthrough-return-opt -disable-block-placement -verify-machineinstrs -fast-isel=false -machine-sink-split-probability-threshold=0 -cgp-freq-ratio-to-skip-merge=1000 -wasm-enable-eh -wasm-enable-exnref -exception-model=wasm -mattr=+exception-handling,bulk-memory
+; RUN: llc < %s -O0 -disable-wasm-fallthrough-return-opt -verify-machineinstrs -wasm-enable-eh -wasm-enable-exnref -exception-model=wasm -mattr=+exception-handling,-bulk-memory | FileCheck %s --check-prefix=NOOPT
+; RUN: llc < %s -disable-wasm-fallthrough-return-opt -disable-block-placement -verify-machineinstrs -fast-isel=false -machine-sink-split-probability-threshold=0 -cgp-freq-ratio-to-skip-merge=1000 -wasm-enable-eh -wasm-enable-exnref -exception-model=wasm -mattr=+exception-handling,-bulk-memory -wasm-disable-ehpad-sort -stats 2>&1 | FileCheck %s --check-prefix=NOSORT
+; RUN: llc < %s -disable-wasm-fallthrough-return-opt -disable-block-placement -verify-machineinstrs -fast-isel=false -machine-sink-split-probability-threshold=0 -cgp-freq-ratio-to-skip-merge=1000 -wasm-enable-eh -wasm-enable-exnref -exception-model=wasm -mattr=+exception-handling,-bulk-memory -wasm-disable-ehpad-sort | FileCheck %s --check-prefix=NOSORT-LOCALS
+
+target triple = "wasm32-unknown-unknown"
+
+ at _ZTIi = external constant ptr
+ at _ZTId = external constant ptr
+
+%class.Object = type { i8 }
+%class.MyClass = type { i32 }
+
+; Simple test case with two catch clauses
+;
+; void foo();
+; void two_catches() {
+;   try {
+;     foo();
+;   } catch (int) {
+;   } catch (double) {
+;   }
+; }
+
+; CHECK-LABEL: two_catches:
+; CHECK: block
+; CHECK:   block     () -> (i32, exnref)
+; CHECK:     try_table    (catch_ref __cpp_exception 0) # 0: down to label[[L0:[0-9]+]]
+; CHECK:       call  foo
+; CHECK:       br        2                               # 2: down to label[[L1:[0-9]+]]
+; CHECK:     end_try_table
+; CHECK:   end_block                                     # label[[L0]]:
+; CHECK:   local.set  2
+; CHECK:   local.set  1
+; CHECK:   local.get  0
+; CHECK:   call  _Unwind_CallPersonality
+; CHECK:   block
+; CHECK:     br_if     0                                 # 0: down to label[[L2:[0-9]+]]
+; CHECK:     call  __cxa_begin_catch
+; CHECK:     call  __cxa_end_catch
+; CHECK:     br        1                                 # 1: down to label[[L1]]
+; CHECK:   end_block                                     # label[[L2]]:
+; CHECK:   block
+; CHECK:     br_if     0                                 # 0: down to label[[L3:[0-9]+]]
+; CHECK:     call  __cxa_begin_catch
+; CHECK:     call  __cxa_end_catch
+; CHECK:     br        1                                 # 1: down to label[[L1]]
+; CHECK:   end_block                                     # label[[L3]]:
+; CHECK:   throw_ref
+; CHECK: end_block                                       # label[[L1]]:
+define void @two_catches() personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  invoke void @foo()
+          to label %try.cont unwind label %catch.dispatch
+
+catch.dispatch:                                   ; preds = %entry
+  %0 = catchswitch within none [label %catch.start] unwind to caller
+
+catch.start:                                      ; preds = %catch.dispatch
+  %1 = catchpad within %0 [ptr @_ZTIi, ptr @_ZTId]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  %4 = call i32 @llvm.eh.typeid.for(ptr @_ZTIi)
+  %matches = icmp eq i32 %3, %4
+  br i1 %matches, label %catch2, label %catch.fallthrough
+
+catch2:                                           ; preds = %catch.start
+  %5 = call ptr @__cxa_begin_catch(ptr %2) [ "funclet"(token %1) ]
+  call void @__cxa_end_catch() [ "funclet"(token %1) ]
+  catchret from %1 to label %try.cont
+
+catch.fallthrough:                                ; preds = %catch.start
+  %6 = call i32 @llvm.eh.typeid.for(ptr @_ZTId)
+  %matches1 = icmp eq i32 %3, %6
+  br i1 %matches1, label %catch, label %rethrow
+
+catch:                                            ; preds = %catch.fallthrough
+  %7 = call ptr @__cxa_begin_catch(ptr %2) [ "funclet"(token %1) ]
+  call void @__cxa_end_catch() [ "funclet"(token %1) ]
+  catchret from %1 to label %try.cont
+
+rethrow:                                          ; preds = %catch.fallthrough
+  call void @llvm.wasm.rethrow() [ "funclet"(token %1) ]
+  unreachable
+
+try.cont:                                         ; preds = %catch, %catch2, %entry
+  ret void
+}
+
+; Nested try-catches within a catch
+; void nested_catch() {
+;   try {
+;     foo();
+;   } catch (int) {
+;     try {
+;       foo();
+;     } catch (int) {
+;       foo();
+;     }
+;   }
+; }
+
+; CHECK-LABEL: nested_catch:
+; CHECK: block     exnref
+; CHECK:   block
+; CHECK:     block     () -> (i32, exnref)
+; CHECK:       try_table    (catch_ref __cpp_exception 0)         # 0: down to label[[L0:[0-9]+]]
+; CHECK:         call  foo
+; CHECK:         br        2                                      # 2: down to label[[L1:[0-9]+]]
+; CHECK:       end_try_table
+; CHECK:     end_block                                            # label[[L0]]:
+; CHECK:     call  _Unwind_CallPersonality
+; CHECK:     block
+; CHECK:       block
+; CHECK:         br_if     0                                      # 0: down to label[[L2:[0-9]+]]
+; CHECK:         call  __cxa_begin_catch
+; CHECK:         block     exnref
+; CHECK:           try_table    (catch_all_ref 0)                 # 0: down to label[[L3:[0-9]+]]
+; CHECK:             block     () -> (i32, exnref)
+; CHECK:               try_table    (catch_ref __cpp_exception 0) # 0: down to label[[L4:[0-9]+]]
+; CHECK:                 call  foo
+; CHECK:                 br        5                              # 5: down to label[[L5:[0-9]+]]
+; CHECK:               end_try_table
+; CHECK:             end_block                                    # label[[L4]]:
+; CHECK:             call  _Unwind_CallPersonality
+; CHECK:             block
+; CHECK:               block
+; CHECK:                 br_if     0                              # 0: down to label[[L6:[0-9]+]]
+; CHECK:                 call  __cxa_begin_catch
+; CHECK:                 block     exnref
+; CHECK:                   try_table    (catch_all_ref 0)         # 0: down to label[[L7:[0-9]+]]
+; CHECK:                     call  foo
+; CHECK:                     br        3                          # 3: down to label[[L8:[0-9]+]]
+; CHECK:                   end_try_table
+; CHECK:                 end_block                                # label[[L7]]:
+; CHECK:                 try_table    (catch_all_ref 7)           # 7: down to label[[L9:[0-9]+]]
+; CHECK:                   call  __cxa_end_catch
+; CHECK:                 end_try_table
+; CHECK:                 throw_ref
+; CHECK:               end_block                                  # label[[L6]]:
+; CHECK:               throw_ref
+; CHECK:             end_block                                    # label[[L8]]:
+; CHECK:             try_table    (catch_all_ref 5)               # 5: down to label[[L9]]
+; CHECK:               call  __cxa_end_catch
+; CHECK:             end_try_table
+; CHECK:             br        3                                  # 3: down to label[[L5]]
+; CHECK:           end_try_table
+; CHECK:         end_block                                        # label[[L3]]:
+; CHECK:         call  __cxa_end_catch
+; CHECK:         throw_ref
+; CHECK:       end_block                                          # label[[L2]]:
+; CHECK:       throw_ref
+; CHECK:     end_block                                            # label[[L5]]:
+; CHECK:     call  __cxa_end_catch
+; CHECK:   end_block                                              # label[[L1]]:
+; CHECK:   return
+; CHECK: end_block                                                # label[[L9]]:
+; CHECK: throw_ref
+define void @nested_catch() personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  invoke void @foo()
+          to label %try.cont11 unwind label %catch.dispatch
+
+catch.dispatch:                                   ; preds = %entry
+  %0 = catchswitch within none [label %catch.start] unwind to caller
+
+catch.start:                                      ; preds = %catch.dispatch
+  %1 = catchpad within %0 [ptr @_ZTIi]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  %4 = call i32 @llvm.eh.typeid.for(ptr @_ZTIi)
+  %matches = icmp eq i32 %3, %4
+  br i1 %matches, label %catch, label %rethrow
+
+catch:                                            ; preds = %catch.start
+  %5 = call ptr @__cxa_begin_catch(ptr %2) [ "funclet"(token %1) ]
+  %6 = load i32, ptr %5, align 4
+  invoke void @foo() [ "funclet"(token %1) ]
+          to label %try.cont unwind label %catch.dispatch2
+
+catch.dispatch2:                                  ; preds = %catch
+  %7 = catchswitch within %1 [label %catch.start3] unwind label %ehcleanup9
+
+catch.start3:                                     ; preds = %catch.dispatch2
+  %8 = catchpad within %7 [ptr @_ZTIi]
+  %9 = call ptr @llvm.wasm.get.exception(token %8)
+  %10 = call i32 @llvm.wasm.get.ehselector(token %8)
+  %11 = call i32 @llvm.eh.typeid.for(ptr @_ZTIi)
+  %matches4 = icmp eq i32 %10, %11
+  br i1 %matches4, label %catch6, label %rethrow5
+
+catch6:                                           ; preds = %catch.start3
+  %12 = call ptr @__cxa_begin_catch(ptr %9) [ "funclet"(token %8) ]
+  %13 = load i32, ptr %12, align 4
+  invoke void @foo() [ "funclet"(token %8) ]
+          to label %invoke.cont8 unwind label %ehcleanup
+
+invoke.cont8:                                     ; preds = %catch6
+  call void @__cxa_end_catch() [ "funclet"(token %8) ]
+  catchret from %8 to label %try.cont
+
+rethrow5:                                         ; preds = %catch.start3
+  invoke void @llvm.wasm.rethrow() [ "funclet"(token %8) ]
+          to label %unreachable unwind label %ehcleanup9
+
+try.cont:                                         ; preds = %invoke.cont8, %catch
+  call void @__cxa_end_catch() [ "funclet"(token %1) ]
+  catchret from %1 to label %try.cont11
+
+rethrow:                                          ; preds = %catch.start
+  call void @llvm.wasm.rethrow() [ "funclet"(token %1) ]
+  unreachable
+
+try.cont11:                                       ; preds = %try.cont, %entry
+  ret void
+
+ehcleanup:                                        ; preds = %catch6
+  %14 = cleanuppad within %8 []
+  call void @__cxa_end_catch() [ "funclet"(token %14) ]
+  cleanupret from %14 unwind label %ehcleanup9
+
+ehcleanup9:                                       ; preds = %ehcleanup, %rethrow5, %catch.dispatch2
+  %15 = cleanuppad within %1 []
+  call void @__cxa_end_catch() [ "funclet"(token %15) ]
+  cleanupret from %15 unwind to caller
+
+unreachable:                                      ; preds = %rethrow5
+  unreachable
+}
+
+; Nested try-catches within a try
+; void nested_try() {
+;   try {
+;     try {
+;       foo();
+;     } catch (...) {
+;     }
+;   } catch (...) {
+;   }
+; }
+
+; CHECK-LABEL: nested_try:
+; CHECK: block
+; CHECK:   block     i32
+; CHECK:     try_table    (catch __cpp_exception 0)   # 0: down to label[[L0:[0-9]+]]
+; CHECK:     block     i32
+; CHECK:       try_table    (catch __cpp_exception 0) # 0: down to label[[L1:[0-9]+]]
+; CHECK:         call  foo
+; CHECK:         br        4                          # 4: down to label[[L2:[0-9]+]]
+; CHECK:       end_try_table
+; CHECK:     end_block                                # label[[L1]]:
+; CHECK:     call  __cxa_begin_catch
+; CHECK:     call  __cxa_end_catch
+; CHECK:     br        2                              # 2: down to label[[L2]]
+; CHECK:     end_try_table
+; CHECK:   end_block                                  # label[[L0]]:
+; CHECK:   call  __cxa_begin_catch
+; CHECK:   call  __cxa_end_catch
+; CHECK: end_block                                    # label[[L2]]:
+define void @nested_try() personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  invoke void @foo()
+          to label %try.cont7 unwind label %catch.dispatch
+
+catch.dispatch:                                   ; preds = %entry
+  %0 = catchswitch within none [label %catch.start] unwind label %catch.dispatch2
+
+catch.start:                                      ; preds = %catch.dispatch
+  %1 = catchpad within %0 [ptr null]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  %4 = call ptr @__cxa_begin_catch(ptr %2) [ "funclet"(token %1) ]
+  invoke void @__cxa_end_catch() [ "funclet"(token %1) ]
+          to label %invoke.cont1 unwind label %catch.dispatch2
+
+catch.dispatch2:                                  ; preds = %catch.start, %catch.dispatch
+  %5 = catchswitch within none [label %catch.start3] unwind to caller
+
+catch.start3:                                     ; preds = %catch.dispatch2
+  %6 = catchpad within %5 [ptr null]
+  %7 = call ptr @llvm.wasm.get.exception(token %6)
+  %8 = call i32 @llvm.wasm.get.ehselector(token %6)
+  %9 = call ptr @__cxa_begin_catch(ptr %7) [ "funclet"(token %6) ]
+  call void @__cxa_end_catch() [ "funclet"(token %6) ]
+  catchret from %6 to label %try.cont7
+
+try.cont7:                                        ; preds = %entry, %invoke.cont1, %catch.start3
+  ret void
+
+invoke.cont1:                                     ; preds = %catch.start
+  catchret from %1 to label %try.cont7
+}
+
+
+; CHECK-LABEL: loop_within_catch:
+; CHECK: block
+; CHECK:   block     i32
+; CHECK:     try_table    (catch __cpp_exception 0) # 0: down to label[[L0:[0-9]+]]
+; CHECK:       call  foo
+; CHECK:       br        2                          # 2: down to label[[L1:[0-9]+]]
+; CHECK:     end_try_table
+; CHECK:   end_block                                # label[[L0]]:
+; CHECK:   call  __cxa_begin_catch
+; CHECK:   loop                                     # label[[L2:[0-9]+]]:
+; CHECK:     block
+; CHECK:       block
+; CHECK:         br_if     0                        # 0: down to label[[L3:[0-9]+]]
+; CHECK:         block     exnref
+; CHECK:           try_table    (catch_all_ref 0)   # 0: down to label[[L4:[0-9]+]]
+; CHECK:             call  foo
+; CHECK:             br        3                    # 3: down to label[[L5:[0-9]+]]
+; CHECK:           end_try_table
+; CHECK:         end_block                          # label[[L4]]:
+; CHECK:         block
+; CHECK:           block
+; CHECK:             try_table    (catch_all 0)     # 0: down to label[[L6:[0-9]+]]
+; CHECK:               call  __cxa_end_catch
+; CHECK:               br        2                  # 2: down to label[[L7:[0-9]+]]
+; CHECK:             end_try_table
+; CHECK:           end_block                        # label[[L6]]:
+; CHECK:           call  _ZSt9terminatev
+; CHECK:           unreachable
+; CHECK:         end_block                          # label[[L7]]:
+; CHECK:         throw_ref
+; CHECK:       end_block                            # label[[L3]]:
+; CHECK:       call  __cxa_end_catch
+; CHECK:       br        2                          # 2: down to label[[L1]]
+; CHECK:     end_block                              # label[[L5]]:
+; CHECK:     br        0                            # 0: up to label[[L2]]
+; CHECK:   end_loop
+; CHECK: end_block                                  # label[[L1]]:
+define void @loop_within_catch() personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  invoke void @foo()
+          to label %try.cont unwind label %catch.dispatch
+
+catch.dispatch:                                   ; preds = %entry
+  %0 = catchswitch within none [label %catch.start] unwind to caller
+
+catch.start:                                      ; preds = %catch.dispatch
+  %1 = catchpad within %0 [ptr null]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  %4 = call ptr @__cxa_begin_catch(ptr %2) [ "funclet"(token %1) ]
+  br label %for.cond
+
+for.cond:                                         ; preds = %for.inc, %catch.start
+  %i.0 = phi i32 [ 0, %catch.start ], [ %inc, %for.inc ]
+  %cmp = icmp slt i32 %i.0, 50
+  br i1 %cmp, label %for.body, label %for.end
+
+for.body:                                         ; preds = %for.cond
+  invoke void @foo() [ "funclet"(token %1) ]
+          to label %for.inc unwind label %ehcleanup
+
+for.inc:                                          ; preds = %for.body
+  %inc = add nsw i32 %i.0, 1
+  br label %for.cond
+
+for.end:                                          ; preds = %for.cond
+  call void @__cxa_end_catch() [ "funclet"(token %1) ]
+  catchret from %1 to label %try.cont
+
+try.cont:                                         ; preds = %for.end, %entry
+  ret void
+
+ehcleanup:                                        ; preds = %for.body
+  %5 = cleanuppad within %1 []
+  invoke void @__cxa_end_catch() [ "funclet"(token %5) ]
+          to label %invoke.cont2 unwind label %terminate
+
+invoke.cont2:                                     ; preds = %ehcleanup
+  cleanupret from %5 unwind to caller
+
+terminate:                                        ; preds = %ehcleanup
+  %6 = cleanuppad within %5 []
+  call void @_ZSt9terminatev() [ "funclet"(token %6) ]
+  unreachable
+}
+
+; Tests if block and try_table markers are correctly placed. Even if two
+; predecessors of the EH pad are bb2 and bb3 and their nearest common dominator
+; is bb1, the TRY_TABLE marker should be placed at bb0 because there's a branch
+; from bb0 to bb2, and scopes cannot be interleaved.
+; NOOPT-LABEL: block_try_table_markers:
+; NOOPT: block
+; NOOPT:   block     i32
+; NOOPT:   try_table    (catch __cpp_exception 0)
+; NOOPT:     block
+; NOOPT:       block
+; NOOPT:         block
+; NOOPT:         end_block
+; NOOPT:       end_block
+; NOOPT:       call  foo
+; NOOPT:     end_block
+; NOOPT:     call  bar
+; NOOPT:   end_try_table
+; NOOPT:   end_block
+; NOOPT: end_block
+define void @block_try_table_markers() personality ptr @__gxx_wasm_personality_v0 {
+bb0:
+  br i1 undef, label %bb1, label %bb2
+
+bb1:                                              ; preds = %bb0
+  br i1 undef, label %bb3, label %bb4
+
+bb2:                                              ; preds = %bb0
+  br label %try.cont
+
+bb3:                                              ; preds = %bb1
+  invoke void @foo()
+          to label %try.cont unwind label %catch.dispatch
+
+bb4:                                              ; preds = %bb1
+  invoke void @bar()
+          to label %try.cont unwind label %catch.dispatch
+
+catch.dispatch:                                   ; preds = %bb4, %bb3
+  %0 = catchswitch within none [label %catch.start] unwind to caller
+
+catch.start:                                      ; preds = %catch.dispatch
+  %1 = catchpad within %0 [ptr null]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  catchret from %1 to label %try.cont
+
+try.cont:                                         ; preds = %catch.start, %bb4, %bb3, %bb2
+  ret void
+}
+
+; Tests if try_table/end_try_table markers are placed correctly wrt
+; loop/end_loop markers, when try_table and loop markers are in the same BB and
+; end_try_table and end_loop are in another BB.
+; CHECK-LABEL: loop_try_table_markers:
+; CHECK: loop
+; CHECK:   block     i32
+; CHECK:     try_table    (catch __cpp_exception 0)
+; CHECK:       call  foo
+; CHECK:     end_try_table
+; CHECK:   end_block
+; CHECK: end_loop
+define void @loop_try_table_markers(ptr %p) personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  store volatile i32 0, ptr %p
+  br label %loop
+
+loop:                                             ; preds = %try.cont, %entry
+  store volatile i32 1, ptr %p
+  invoke void @foo()
+          to label %try.cont unwind label %catch.dispatch
+
+catch.dispatch:                                   ; preds = %loop
+  %0 = catchswitch within none [label %catch.start] unwind to caller
+
+catch.start:                                      ; preds = %catch.dispatch
+  %1 = catchpad within %0 [ptr null]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  catchret from %1 to label %try.cont
+
+try.cont:                                         ; preds = %catch.start, %loop
+  br label %loop
+}
+
+; Some of test cases below are hand-tweaked by deleting some library calls to
+; simplify tests and changing the order of basic blocks to cause unwind
+; destination mismatches. And we use -wasm-disable-ehpad-sort to create maximum
+; number of mismatches in several tests below.
+
+; - Call unwind mismatch
+; 'call bar''s original unwind destination was 'C0', but after control flow
+; linearization, its unwind destination incorrectly becomes 'C1'. We fix this by
+; wrapping the call with a nested try_table-end_try_table that targets 'C0'.
+; - Catch unwind mismatch
+; If 'call foo' throws a foreign exception, it will not be caught by C1, and
+; should be rethrown to the caller. But after control flow linearization, it
+; will instead unwind to C0, an incorrect next EH pad. We wrap the whole
+; try_table-end_try_table with another try_table-end_try_table that jumps to a
+; trampoline BB, from which we rethrow the exception to the caller to fix this.
+
+; NOSORT-LABEL: unwind_mismatches_0:
+; NOSORT: block     exnref
+; NOSORT:   block
+; NOSORT:     block     i32
+; NOSORT:       try_table    (catch __cpp_exception 0)         # 0: down to label[[L0:[0-9]+]]
+; NOSORT:         block     exnref
+; NOSORT:           block     i32
+; --- nested try_table-end_try_table starts (catch unwind mismatch)
+; NOSORT:             try_table    (catch_all_ref 5)           # 5: down to label[[L1:[0-9]+]]
+; NOSORT:               try_table    (catch __cpp_exception 1) # 1: down to label[[L2:[0-9]+]]
+; NOSORT:                 call  foo
+; --- nested try_table-end_try_table starts (call unwind mismatch)
+; NOSORT:                 try_table    (catch_all_ref 3)       # 3: down to label[[L3:[0-9]+]]
+; NOSORT:                   call  bar
+; NOSORT:                 end_try_table
+; --- nested try_table-end_try_table ends (call unwind mismatch)
+; NOSORT:               end_try_table
+; NOSORT:             end_try_table
+; --- nested try_table-end_try_table ends (catch unwind mismatch)
+; NOSORT:           end_block                                  # label[[L2]]:
+; NOSORT:         end_block                                    # label[[L3]]:
+; NOSORT:         throw_ref
+; NOSORT:       end_try_table
+; NOSORT:     end_block                                        # label[[L0]]:
+; NOSORT:   end_block
+; NOSORT:   return
+; NOSORT: end_block                                            # label[[L1]]:
+; NOSORT: throw_ref
+define void @unwind_mismatches_0() personality ptr @__gxx_wasm_personality_v0 {
+bb0:
+  invoke void @foo()
+          to label %bb1 unwind label %catch.dispatch0
+
+bb1:                                              ; preds = %bb0
+  invoke void @bar()
+          to label %try.cont unwind label %catch.dispatch1
+
+catch.dispatch0:                                  ; preds = %bb0
+  %0 = catchswitch within none [label %catch.start0] unwind to caller
+
+catch.start0:                                     ; preds = %catch.dispatch0
+  %1 = catchpad within %0 [ptr null]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  catchret from %1 to label %try.cont
+
+catch.dispatch1:                                  ; preds = %bb1
+  %4 = catchswitch within none [label %catch.start1] unwind to caller
+
+catch.start1:                                     ; preds = %catch.dispatch1
+  %5 = catchpad within %4 [ptr null]
+  %6 = call ptr @llvm.wasm.get.exception(token %5)
+  %7 = call i32 @llvm.wasm.get.ehselector(token %5)
+  catchret from %5 to label %try.cont
+
+try.cont:                                         ; preds = %catch.start1, %catch.start0, %bb1
+  ret void
+}
+
+; 'call bar' and 'call baz''s original unwind destination was the caller, but
+; after control flow linearization, their unwind destination incorrectly becomes
+; 'C0'. We fix this by wrapping the calls with a nested try_table-end_try_table
+; that jumps to a trampoline BB where we rethrow the exception to the caller.
+
+; And the return value of 'baz' should NOT be stackified because the BB is split
+; during fixing unwind mismatches.
+
+; NOSORT-LABEL: unwind_mismatches_1:
+; NOSORT: block     exnref
+; NOSORT:   block     i32
+; NOSORT:     try_table    (catch __cpp_exception 0) # 0: down to label[[L0:[0-9]+]]
+; NOSORT:       call  foo
+; --- nested try_table-end_try_table starts (call unwind mismatch)
+; NOSORT:       try_table    (catch_all_ref 2)       # 2: down to label[[L1:[0-9]+]]
+; NOSORT:         call  bar
+; NOSORT:         call  baz
+; NOSORT:         local.set [[LOCAL:[0-9]+]]
+; NOSORT:       end_try_table
+; --- nested try_table-end_try_table ends (call unwind mismatch)
+; NOSORT:       local.get [[LOCAL]]
+; NOSORT:       call  nothrow
+; NOSORT:       return
+; NOSORT:     end_try_table
+; NOSORT:   end_block                                # label[[L0]]:
+; NOSORT:   return
+; NOSORT: end_block                                  # label[[L1]]:
+; NOSORT: throw_ref
+define void @unwind_mismatches_1() personality ptr @__gxx_wasm_personality_v0 {
+bb0:
+  invoke void @foo()
+          to label %bb1 unwind label %catch.dispatch0
+
+bb1:                                              ; preds = %bb0
+  call void @bar()
+  %call = call i32 @baz()
+  call void @nothrow(i32 %call) #0
+  ret void
+
+catch.dispatch0:                                  ; preds = %bb0
+  %0 = catchswitch within none [label %catch.start0] unwind to caller
+
+catch.start0:                                     ; preds = %catch.dispatch0
+  %1 = catchpad within %0 [ptr null]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  catchret from %1 to label %try.cont
+
+try.cont:                                         ; preds = %catch.start0
+  ret void
+}
+
+; The same as unwind_mismatches_0, but we have one more call 'call @foo' in bb1
+; which unwinds to the caller. IN this case bb1 has two call unwind mismatches:
+; 'call @foo' unwinds to the caller and 'call @bar' unwinds to catch C0.
+
+; NOSORT-LABEL: unwind_mismatches_2:
+; NOSORT: block     exnref
+; NOSORT:   block
+; NOSORT:     block     i32
+; NOSORT:       try_table    (catch __cpp_exception 0)         # 0: down to label[[L0:[0-9]+]]
+; NOSORT:         block     exnref
+; NOSORT:           block     i32
+; --- nested try_table-end_try_table starts (catch unwind mismatch)
+; NOSORT:             try_table    (catch_all_ref 5)           # 5: down to label[[L1:[0-9]+]]
+; NOSORT:               try_table    (catch __cpp_exception 1) # 1: down to label[[L2:[0-9]+]]
+; NOSORT:                 call  foo
+; --- nested try_table-end_try_table starts (call unwind mismatch)
+; NOSORT:                 try_table    (catch_all_ref 7)       # 7: down to label[[L1]]
+; NOSORT:                   call  foo
+; NOSORT:                 end_try_table
+; --- nested try_table-end_try_table ends (call unwind mismatch)
+; --- nested try_table-end_try_table starts (call unwind mismatch)
+; NOSORT:                 try_table    (catch_all_ref 3)       # 3: down to label[[L3:[0-9]+]]
+; NOSORT:                   call  bar
+; NOSORT:                 end_try_table
+; --- nested try_table-end_try_table ends (call unwind mismatch)
+; NOSORT:               end_try_table
+; NOSORT:             end_try_table
+; --- nested try_table-end_try_table ends (catch unwind mismatch)
+; NOSORT:           end_block                                  # label[[L2]]:
+; NOSORT:         end_block                                    # label[[L3]]:
+; NOSORT:         throw_ref
+; NOSORT:       end_try_table
+; NOSORT:     end_block                                        # label[[L0]]:
+; NOSORT:   end_block
+; NOSORT:   return
+; NOSORT: end_block                                            # label[[L1]]:
+; NOSORT: throw_ref
+define void @unwind_mismatches_2() personality ptr @__gxx_wasm_personality_v0 {
+bb0:
+  invoke void @foo()
+          to label %bb1 unwind label %catch.dispatch0
+
+bb1:                                              ; preds = %bb0
+  call void @foo()
+  invoke void @bar()
+          to label %try.cont unwind label %catch.dispatch1
+
+catch.dispatch0:                                  ; preds = %bb0
+  %0 = catchswitch within none [label %catch.start0] unwind to caller
+
+catch.start0:                                     ; preds = %catch.dispatch0
+  %1 = catchpad within %0 [ptr null]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  catchret from %1 to label %try.cont
+
+catch.dispatch1:                                  ; preds = %bb1
+  %4 = catchswitch within none [label %catch.start1] unwind to caller
+
+catch.start1:                                     ; preds = %catch.dispatch1
+  %5 = catchpad within %4 [ptr null]
+  %6 = call ptr @llvm.wasm.get.exception(token %5)
+  %7 = call i32 @llvm.wasm.get.ehselector(token %5)
+  catchret from %5 to label %try.cont
+
+try.cont:                                         ; preds = %catch.start1, %catch.start0, %bb1
+  ret void
+}
+
+; Similar situation as @unwind_mismatches_1. Here 'call @qux''s original unwind
+; destination was the caller, but after control flow linearization, their unwind
+; destination incorrectly becomes 'C0' within the function. We fix this by
+; wrapping the call with a nested try_table-end_try_table that jumps to a
+; trampoline BB where rethrow the exception to the caller.
+
+; Because 'call @qux' pops an argument pushed by 'i32.const 5' from stack, the
+; nested 'try_table' should be placed before `i32.const 5', not between
+; 'i32.const 5' and 'call @qux'.
+
+; NOSORT-LABEL: unwind_mismatches_3:
+; NOSORT: block     exnref
+; NOSORT:   block     i32
+; NOSORT:     try_table    (catch __cpp_exception 0) # 0: down to label[[L0:[0-9]+]]
+; NOSORT:       call  foo
+; --- nested try_table-end_try_table starts (call unwind mismatch)
+; NOSORT:       try_table    (catch_all_ref 2)      # 2: down to label[[L1:[0-9]+]]
+; NOSORT:         i32.const  5
+; NOSORT:         call  qux
+; NOSORT:       end_try_table
+; --- nested try_table-end_try_table ends (call unwind mismatch)
+; NOSORT:       return
+; NOSORT:     end_try_table
+; NOSORT:   end_block                               # label[[L0]]:
+; NOSORT:   return
+; NOSORT: end_block                               # label[[L1]]:
+; NOSORT: throw_ref
+define i32 @unwind_mismatches_3() personality ptr @__gxx_wasm_personality_v0 {
+bb0:
+  invoke void @foo()
+          to label %bb1 unwind label %catch.dispatch0
+
+bb1:                                              ; preds = %bb0
+  %0 = call i32 @qux(i32 5)
+  ret i32 %0
+
+catch.dispatch0:                                  ; preds = %bb0
+  %1 = catchswitch within none [label %catch.start0] unwind to caller
+
+catch.start0:                                     ; preds = %catch.dispatch0
+  %2 = catchpad within %1 [ptr null]
+  %3 = call ptr @llvm.wasm.get.exception(token %2)
+  %j = call i32 @llvm.wasm.get.ehselector(token %2)
+  catchret from %2 to label %try.cont
+
+try.cont:                                         ; preds = %catch.start0
+  ret i32 0
+}
+
+; We have two call unwind unwind mismatches:
+; - A may-throw instruction unwinds to an incorrect EH pad after linearizing the
+;   CFG, when it is supposed to unwind to another EH pad.
+; - A may-throw instruction unwinds to an incorrect EH pad after linearizing the
+;   CFG, when it is supposed to unwind to the caller.
+; We also have a catch unwind mismatch: If an exception is not caught by the
+; first catch because it is a non-C++ exception, it shouldn't unwind to the next
+; catch, but it should unwind to the caller.
+
+; NOSORT-LABEL: unwind_mismatches_4:
+; NOSORT: block     exnref
+; NOSORT:   block
+; NOSORT:     block     i32
+; NOSORT:       try_table    (catch __cpp_exception 0)       # 0: down to label[[L0:[0-9]+]]
+; NOSORT:         block     exnref
+; NOSORT:           block     i32
+; --- nested try_table-end_try_table starts (catch unwind mismatch)
+; NOSORT:           try_table    (catch_all_ref 5)           # 5: down to label[[L1:[0-9]+]]
+; NOSORT:             try_table    (catch __cpp_exception 1) # 1: down to label[[L2:[0-9]+]]
+; NOSORT:               call  foo
+; --- nested try_table-end_try_table starts (call unwind mismatch)
+; NOSORT:               try_table    (catch_all_ref 3)       # 3: down to label[[L3:[0-9]+]]
+; NOSORT:                 call  bar
+; NOSORT:               end_try_table
+; --- nested try_table-end_try_table ends (call unwind mismatch)
+; NOSORT:             end_try_table
+; NOSORT:           end_try_table
+; --- nested try_table-end_try_table ends (catch unwind mismatch)
+; NOSORT:           end_block                                # label[[L2]]:
+; NOSORT:           call  __cxa_begin_catch
+; --- nested try_table-end_try_table starts (call unwind mismatch)
+; NOSORT:           try_table    (catch_all_ref 4)           # 4: down to label[[L1]]
+; NOSORT:             call  __cxa_end_catch
+; NOSORT:           end_try_table
+; --- nested try_table-end_try_table ends (call unwind mismatch)
+; NOSORT:         end_block                                  # label[[L3]]:
+; NOSORT:         throw_ref
+; NOSORT:       end_try_table
+; NOSORT:     end_block                                      # label[[L0]]:
+; NOSORT:     call  __cxa_begin_catch
+; NOSORT:     call  __cxa_end_catch
+; NOSORT:   end_block                                        # label74:
+; NOSORT:   return
+; NOSORT: end_block                                          # label[[L1]]:
+; NOSORT: throw_ref
+define void @unwind_mismatches_4() personality ptr @__gxx_wasm_personality_v0 {
+bb0:
+  invoke void @foo()
+          to label %bb1 unwind label %catch.dispatch0
+
+bb1:                                              ; preds = %bb0
+  invoke void @bar()
+          to label %try.cont unwind label %catch.dispatch1
+
+catch.dispatch0:                                  ; preds = %bb0
+  %0 = catchswitch within none [label %catch.start0] unwind to caller
+
+catch.start0:                                     ; preds = %catch.dispatch0
+  %1 = catchpad within %0 [ptr null]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  %4 = call ptr @__cxa_begin_catch(ptr %2) [ "funclet"(token %1) ]
+  call void @__cxa_end_catch() [ "funclet"(token %1) ]
+  catchret from %1 to label %try.cont
+
+catch.dispatch1:                                  ; preds = %bb1
+  %5 = catchswitch within none [label %catch.start1] unwind to caller
+
+catch.start1:                                     ; preds = %catch.dispatch1
+  %6 = catchpad within %5 [ptr null]
+  %7 = call ptr @llvm.wasm.get.exception(token %6)
+  %8 = call i32 @llvm.wasm.get.ehselector(token %6)
+  %9 = call ptr @__cxa_begin_catch(ptr %7) [ "funclet"(token %6) ]
+  call void @__cxa_end_catch() [ "funclet"(token %6) ]
+  catchret from %6 to label %try.cont
+
+try.cont:                                         ; preds = %catch.start1, %catch.start0, %bb1
+  ret void
+}
+
+; This crashed when updating EHPadStack within fixCallUniwindMismatch had a bug.
+; This should not crash and a nested try_table-end_try_table has to be created
+; around 'call @baz', because the initial TRY_TABLE placement for 'call @quux'
+; was done before 'call @baz' because 'call @baz''s return value is stackified.
+
+; CHECK-LABEL: unwind_mismatches_5:
+; CHECK: block     exnref
+; CHECK:   block
+; CHECK:     block     exnref
+; CHECK:       try_table    (catch_all_ref 0)    # 0: down to label[[L0:[0-9]+]]
+; --- nested try_table-end_try_table starts (call unwind mismatch)
+; CHECK:         try_table    (catch_all_ref 3)  # 3: down to label[[L1:[0-9]+]]
+; CHECK:           call  baz
+; CHECK:         end_try_table
+; --- nested try_table-end_try_table ends (call unwind mismatch)
+; CHECK:         call  quux
+; CHECK:       end_try_table
+; CHECK:     end_block                           # label[[L0]]:
+; CHECK:     throw_ref
+; CHECK:   end_block
+; CHECK:   unreachable
+; CHECK: end_block                               # label[[L1]]:
+; CHECK: throw_ref
+define void @unwind_mismatches_5() personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  %call = call i32 @baz()
+  invoke void @quux(i32 %call)
+          to label %invoke.cont unwind label %ehcleanup
+
+ehcleanup:                                        ; preds = %entry
+  %0 = cleanuppad within none []
+  cleanupret from %0 unwind to caller
+
+invoke.cont:                                      ; preds = %entry
+  unreachable
+}
+
+; The structure is similar to unwind_mismatches_0, where the call to 'bar''s
+; original unwind destination is catch.dispatch1 but after placing markers it
+; unwinds to catch.dispatch0, which we fix. This additionally has a loop before
+; the real unwind destination (catch.dispatch1). This makes sure the code
+; generation works when the unwind destination has an end_loop before
+; end_try_table before the mismatch fixing.
+
+; NOSORT-LABEL: unwind_mismatches_with_loop:
+; NOSORT: block     exnref
+; NOSORT:   block     i32
+; NOSORT:     try_table    (catch __cpp_exception 0)           # 0: down to label[[L0:[0-9]+]]
+; NOSORT:       block     exnref
+; NOSORT:         block
+; NOSORT:           block     i32
+; --- nested try_table-end_try_table starts (catch unwind mismatch)
+; NOSORT:             try_table    (catch_all_ref 5)           # 5: down to label[[L1:[0-9]+]]
+; NOSORT:               try_table    (catch __cpp_exception 1) # 1: down to label[[L2:[0-9]+]]
+; NOSORT:                 call  foo
+; --- nested try_table-end_try_table starts (call unwind mismatch)
+; NOSORT:                 try_table    (catch_all_ref 4)       # 4: down to label[[L3:[0-9]+]]
+; NOSORT:                   call  bar
+; NOSORT:                 end_try_table
+; --- nested try_table-end_try_table ends (call unwind mismatch)
+; NOSORT:               end_try_table
+; NOSORT:             end_try_table
+; --- nested try_table-end_try_table ends (catch unwind mismatch)
+; NOSORT:           end_block                                  # label[[L2]]:
+; NOSORT:         end_block
+; NOSORT:         loop
+; NOSORT:           call  foo
+; NOSORT:         end_loop
+; NOSORT:       end_block                                      # label[[L3]]:
+; NOSORT:       throw_ref
+; NOSORT:     end_try_table
+; NOSORT:   end_block                                          # label[[L0]]:
+; NOSORT:   return
+; NOSORT: end_block                                            # label[[L1]]:
+; NOSORT: throw_ref
+define void @unwind_mismatches_with_loop() personality ptr @__gxx_wasm_personality_v0 {
+bb0:
+  invoke void @foo()
+          to label %bb1 unwind label %catch.dispatch0
+
+bb1:                                              ; preds = %bb0
+  invoke void @bar()
+          to label %bb2 unwind label %catch.dispatch1
+
+catch.dispatch0:                                  ; preds = %bb0
+  %0 = catchswitch within none [label %catch.start0] unwind to caller
+
+catch.start0:                                     ; preds = %catch.dispatch0
+  %1 = catchpad within %0 [ptr null]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  catchret from %1 to label %bb2
+
+bb2:
+  invoke void @foo()
+          to label %bb3 unwind label %catch.dispatch1
+
+bb3:                                             ; preds = %bb14
+  br label %bb2
+
+catch.dispatch1:                                  ; preds = %bb1
+  %4 = catchswitch within none [label %catch.start1] unwind to caller
+
+catch.start1:                                     ; preds = %catch.dispatch1
+  %5 = catchpad within %4 [ptr null]
+  %6 = call ptr @llvm.wasm.get.exception(token %5)
+  %7 = call i32 @llvm.wasm.get.ehselector(token %5)
+  catchret from %5 to label %try.cont
+
+try.cont:                                         ; preds = %catch.start1, %catch.start0, %bb1
+  ret void
+}
+
+; Tests the case when TEE stackifies a register in RegStackify but it gets
+; unstackified in fixCallUnwindMismatches in CFGStackify.
+
+; NOSORT-LOCALS-LABEL: unstackify_when_fixing_unwind_mismatch:
+define void @unstackify_when_fixing_unwind_mismatch(i32 %x) personality ptr @__gxx_wasm_personality_v0 {
+bb0:
+  invoke void @foo()
+          to label %bb1 unwind label %catch.dispatch0
+
+bb1:                                              ; preds = %bb0
+  %t = add i32 %x, 4
+  ; This %addr is used in multiple places, so tee is introduced in RegStackify,
+  ; which stackifies the use of %addr in store instruction. A tee has two dest
+  ; registers, the first of which is stackified and the second is not.
+  ; But when we introduce a nested try_table-end_try_table in
+  ; fixCallUnwindMismatches in CFGStackify, we end up unstackifying the first
+  ; dest register. In that case, we convert that tee into a copy.
+  %addr = inttoptr i32 %t to ptr
+  %load = load i32, ptr %addr
+  %call = call i32 @baz()
+  %add = add i32 %load, %call
+  store i32 %add, ptr %addr
+  ret void
+; NOSORT-LOCALS:       i32.add
+; NOSORT-LOCALS-NOT:   local.tee
+; NOSORT-LOCALS-NEXT:  local.set
+
+catch.dispatch0:                                  ; preds = %bb0
+  %0 = catchswitch within none [label %catch.start0] unwind to caller
+
+catch.start0:                                     ; preds = %catch.dispatch0
+  %1 = catchpad within %0 [ptr null]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  catchret from %1 to label %try.cont
+
+try.cont:                                         ; preds = %catch.start0
+  ret void
+}
+
+; In CFGSort, EH pads should be sorted as soon as it is available and
+; 'Preferred' queue and should NOT be entered into 'Ready' queue unless we are
+; in the middle of sorting another region that does not contain the EH pad. In
+; this example, 'catch.start' should be sorted right after 'if.then' is sorted
+; (before 'cont' is sorted) and there should not be any unwind destination
+; mismatches in CFGStackify.
+
+; NOOPT-LABEL: cfg_sort_order:
+; NOOPT: block
+; NOOPT:   block
+; NOOPT:     block     i32
+; NOOPT:       try_table    (catch __cpp_exception 0)
+; NOOPT:         call  foo
+; NOOPT:       end_try_table
+; NOOPT:     end_block
+; NOOPT:     call  __cxa_begin_catch
+; NOOPT:     call  __cxa_end_catch
+; NOOPT:   end_block
+; NOOPT:   call  foo
+; NOOPT: end_block
+; NOOPT: return
+define void @cfg_sort_order(i32 %arg) personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  %tobool = icmp ne i32 %arg, 0
+  br i1 %tobool, label %if.then, label %if.end
+
+catch.dispatch:                                   ; preds = %if.then
+  %0 = catchswitch within none [label %catch.start] unwind to caller
+
+catch.start:                                      ; preds = %catch.dispatch
+  %1 = catchpad within %0 [ptr null]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  %4 = call ptr @__cxa_begin_catch(ptr %2) [ "funclet"(token %1) ]
+  call void @__cxa_end_catch() [ "funclet"(token %1) ]
+  catchret from %1 to label %if.end
+
+if.then:                                          ; preds = %entry
+  invoke void @foo()
+          to label %cont unwind label %catch.dispatch
+
+cont:                                             ; preds = %if.then
+  call void @foo()
+  br label %if.end
+
+if.end:                                           ; preds = %cont, %catch.start, %entry
+  ret void
+}
+
+; Intrinsics like memcpy, memmove, and memset don't throw and are lowered into
+; calls to external symbols (not global addresses) in instruction selection,
+; which will be eventually lowered to library function calls.
+; Because this test runs with -wasm-disable-ehpad-sort, these library calls in
+; invoke.cont BB fall within try~end_try, but they shouldn't cause crashes or
+; unwinding destination mismatches in CFGStackify.
+
+; NOSORT-LABEL: mem_intrinsics:
+; NOSORT: block     exnref
+; NOSORT:   try_table    (catch_all_ref 0)
+; NOSORT:     call  foo
+; NOSORT:     call  memcpy
+; NOSORT:     call  memmove
+; NOSORT:     call  memset
+; NOSORT:     return
+; NOSORT:   end_try_table
+; NOSORT: end_block
+; NOSORT: throw_ref
+define void @mem_intrinsics(ptr %a, ptr %b) personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  %o = alloca %class.Object, align 1
+  invoke void @foo()
+          to label %invoke.cont unwind label %ehcleanup
+
+invoke.cont:                                      ; preds = %entry
+  call void @llvm.memcpy.p0.p0.i32(ptr %a, ptr %b, i32 100, i1 false)
+  call void @llvm.memmove.p0.p0.i32(ptr %a, ptr %b, i32 100, i1 false)
+  call void @llvm.memset.p0.i32(ptr %a, i8 0, i32 100, i1 false)
+  %call = call ptr @_ZN6ObjectD2Ev(ptr %o)
+  ret void
+
+ehcleanup:                                        ; preds = %entry
+  %0 = cleanuppad within none []
+  %call2 = call ptr @_ZN6ObjectD2Ev(ptr %o) [ "funclet"(token %0) ]
+  cleanupret from %0 unwind to caller
+}
+
+; Tests if 'try_table' marker is placed correctly. In this test, 'try_table'
+; should be placed before the call to 'nothrow_i32' and not between the call to
+; 'nothrow_i32' and 'fun', because the return value of 'nothrow_i32' is
+; stackified and pushed onto the stack to be consumed by the call to 'fun'.
+
+; CHECK-LABEL: try_table_marker_with_stackified_input:
+; CHECK: try_table    (catch_all 0)
+; CHECK: call  nothrow_i32
+; CHECK: call  fun
+define void @try_table_marker_with_stackified_input() personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  %call = call i32 @nothrow_i32()
+  invoke void @fun(i32 %call)
+          to label %invoke.cont unwind label %terminate
+
+invoke.cont:                                      ; preds = %entry
+  ret void
+
+terminate:                                        ; preds = %entry
+  %0 = cleanuppad within none []
+  call void @_ZSt9terminatev() [ "funclet"(token %0) ]
+  unreachable
+}
+
+; This crashed on debug mode (= when NDEBUG is not defined) when the logic for
+; computing the innermost region was not correct, in which a loop region
+; contains an exception region. This should pass CFGSort without crashing.
+define void @loop_exception_region() personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  %e = alloca %class.MyClass, align 4
+  br label %for.cond
+
+for.cond:                                         ; preds = %for.inc, %entry
+  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
+  %cmp = icmp slt i32 %i.0, 9
+  br i1 %cmp, label %for.body, label %for.end
+
+for.body:                                         ; preds = %for.cond
+  invoke void @quux(i32 %i.0)
+          to label %for.inc unwind label %catch.dispatch
+
+catch.dispatch:                                   ; preds = %for.body
+  %0 = catchswitch within none [label %catch.start] unwind to caller
+
+catch.start:                                      ; preds = %catch.dispatch
+  %1 = catchpad within %0 [ptr @_ZTI7MyClass]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  %4 = call i32 @llvm.eh.typeid.for(ptr @_ZTI7MyClass)
+  %matches = icmp eq i32 %3, %4
+  br i1 %matches, label %catch, label %rethrow
+
+catch:                                            ; preds = %catch.start
+  %5 = call ptr @__cxa_get_exception_ptr(ptr %2) [ "funclet"(token %1) ]
+  %call = call ptr @_ZN7MyClassC2ERKS_(ptr %e, ptr dereferenceable(4) %5) [ "funclet"(token %1) ]
+  %6 = call ptr @__cxa_begin_catch(ptr %2) [ "funclet"(token %1) ]
+  %7 = load i32, ptr %e, align 4
+  invoke void @quux(i32 %7) [ "funclet"(token %1) ]
+          to label %invoke.cont2 unwind label %ehcleanup
+
+invoke.cont2:                                     ; preds = %catch
+  %call3 = call ptr @_ZN7MyClassD2Ev(ptr %e) [ "funclet"(token %1) ]
+  call void @__cxa_end_catch() [ "funclet"(token %1) ]
+  catchret from %1 to label %for.inc
+
+rethrow:                                          ; preds = %catch.start
+  call void @llvm.wasm.rethrow() [ "funclet"(token %1) ]
+  unreachable
+
+for.inc:                                          ; preds = %invoke.cont2, %for.body
+  %inc = add nsw i32 %i.0, 1
+  br label %for.cond
+
+ehcleanup:                                        ; preds = %catch
+  %8 = cleanuppad within %1 []
+  %call4 = call ptr @_ZN7MyClassD2Ev(ptr %e) [ "funclet"(token %8) ]
+  invoke void @__cxa_end_catch() [ "funclet"(token %8) ]
+          to label %invoke.cont6 unwind label %terminate7
+
+invoke.cont6:                                     ; preds = %ehcleanup
+  cleanupret from %8 unwind to caller
+
+for.end:                                          ; preds = %for.cond
+  ret void
+
+terminate7:                                       ; preds = %ehcleanup
+  %9 = cleanuppad within %8 []
+  call void @_ZSt9terminatev() [ "funclet"(token %9) ]
+  unreachable
+}
+
+; Here exceptions are semantically contained in a loop. 'ehcleanup' BB belongs
+; to the exception, but does not belong to the loop (because it does not have a
+; path back to the loop header), and is placed after the loop latch block
+; 'invoke.cont' intentionally. This tests if 'end_loop' marker is placed
+; correctly not right after 'invoke.cont' part but after 'ehcleanup' part.
+; NOSORT-LABEL: loop_contains_exception:
+; NOSORT: loop
+; NOSORT:   try_table    (catch __cpp_exception 0)
+; NOSORT:   end_try_table
+; NOSORT:   try_table    (catch_all 0)
+; NOSORT:   end_try_table
+; NOSORT: end_loop
+define void @loop_contains_exception(i32 %n) personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  br label %while.cond
+
+while.cond:                                       ; preds = %invoke.cont, %entry
+  %n.addr.0 = phi i32 [ %n, %entry ], [ %dec, %invoke.cont ]
+  %tobool = icmp ne i32 %n.addr.0, 0
+  br i1 %tobool, label %while.body, label %while.end
+
+while.body:                                       ; preds = %while.cond
+  %dec = add nsw i32 %n.addr.0, -1
+  invoke void @foo()
+          to label %while.end unwind label %catch.dispatch
+
+catch.dispatch:                                   ; preds = %while.body
+  %0 = catchswitch within none [label %catch.start] unwind to caller
+
+catch.start:                                      ; preds = %catch.dispatch
+  %1 = catchpad within %0 [ptr null]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  %4 = call ptr @__cxa_begin_catch(ptr %2) [ "funclet"(token %1) ]
+  invoke void @__cxa_end_catch() [ "funclet"(token %1) ]
+          to label %invoke.cont unwind label %ehcleanup
+
+invoke.cont:                                      ; preds = %catch.start
+  catchret from %1 to label %while.cond
+
+ehcleanup:                                        ; preds = %catch.start
+  %5 = cleanuppad within %1 []
+  call void @_ZSt9terminatev() [ "funclet"(token %5) ]
+  unreachable
+
+while.end:                                        ; preds = %while.body, %while.cond
+  ret void
+}
+
+; Regression test for WasmEHFuncInfo's reverse mapping bug. 'UnwindDestToSrc'
+; should return a vector and not a single BB, which was incorrect.
+; This was reduced by bugpoint and should not crash in CFGStackify.
+define void @wasm_eh_func_info_regression_test() personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  invoke void @foo()
+          to label %invoke.cont unwind label %catch.dispatch
+
+catch.dispatch:                                   ; preds = %entry
+  %0 = catchswitch within none [label %catch.start] unwind label %ehcleanup22
+
+catch.start:                                      ; preds = %catch.dispatch
+  %1 = catchpad within %0 [ptr @_ZTIi]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  invoke void @__cxa_throw(ptr null, ptr null, ptr null) #1 [ "funclet"(token %1) ]
+          to label %unreachable unwind label %catch.dispatch2
+
+catch.dispatch2:                                  ; preds = %catch.start
+  %4 = catchswitch within %1 [label %catch.start3] unwind label %ehcleanup
+
+catch.start3:                                     ; preds = %catch.dispatch2
+  %5 = catchpad within %4 [ptr @_ZTIi]
+  %6 = call ptr @llvm.wasm.get.exception(token %5)
+  %7 = call i32 @llvm.wasm.get.ehselector(token %5)
+  catchret from %5 to label %try.cont
+
+try.cont:                                         ; preds = %catch.start3
+  invoke void @foo() [ "funclet"(token %1) ]
+          to label %invoke.cont8 unwind label %ehcleanup
+
+invoke.cont8:                                     ; preds = %try.cont
+  invoke void @__cxa_throw(ptr null, ptr null, ptr null) #1 [ "funclet"(token %1) ]
+          to label %unreachable unwind label %catch.dispatch11
+
+catch.dispatch11:                                 ; preds = %invoke.cont8
+  %8 = catchswitch within %1 [label %catch.start12] unwind label %ehcleanup
+
+catch.start12:                                    ; preds = %catch.dispatch11
+  %9 = catchpad within %8 [ptr @_ZTIi]
+  %10 = call ptr @llvm.wasm.get.exception(token %9)
+  %11 = call i32 @llvm.wasm.get.ehselector(token %9)
+  unreachable
+
+invoke.cont:                                      ; preds = %entry
+  unreachable
+
+ehcleanup:                                        ; preds = %catch.dispatch11, %try.cont, %catch.dispatch2
+  %12 = cleanuppad within %1 []
+  cleanupret from %12 unwind label %ehcleanup22
+
+ehcleanup22:                                      ; preds = %ehcleanup, %catch.dispatch
+  %13 = cleanuppad within none []
+  cleanupret from %13 unwind to caller
+
+unreachable:                                      ; preds = %invoke.cont8, %catch.start
+  unreachable
+}
+
+; void exception_grouping_0() {
+;   try {
+;     try {
+;       throw 0;
+;     } catch (int) {
+;     }
+;   } catch (int) {
+;   }
+; }
+;
+; Regression test for a WebAssemblyException grouping bug. After catchswitches
+; are removed, EH pad catch.start2 is dominated by catch.start, but because
+; catch.start2 is the unwind destination of catch.start, it should not be
+; included in catch.start's exception. Also, after we take catch.start2's
+; exception out of catch.start's exception, we have to take out try.cont8 out of
+; catch.start's exception, because it has a predecessor in catch.start2.
+define void @exception_grouping_0() personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  %exception = call ptr @__cxa_allocate_exception(i32 4) #0
+  store i32 0, ptr %exception, align 16
+  invoke void @__cxa_throw(ptr %exception, ptr @_ZTIi, ptr null) #1
+          to label %unreachable unwind label %catch.dispatch
+
+catch.dispatch:                                   ; preds = %entry
+  %0 = catchswitch within none [label %catch.start] unwind label %catch.dispatch1
+
+catch.start:                                      ; preds = %catch.dispatch
+  %1 = catchpad within %0 [ptr @_ZTIi]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  %4 = call i32 @llvm.eh.typeid.for(ptr @_ZTIi) #0
+  %matches = icmp eq i32 %3, %4
+  br i1 %matches, label %catch, label %rethrow
+
+catch:                                            ; preds = %catch.start
+  %5 = call ptr @__cxa_begin_catch(ptr %2) #0 [ "funclet"(token %1) ]
+  %6 = load i32, ptr %5, align 4
+  call void @__cxa_end_catch() #0 [ "funclet"(token %1) ]
+  catchret from %1 to label %catchret.dest
+
+catchret.dest:                                    ; preds = %catch
+  br label %try.cont
+
+rethrow:                                          ; preds = %catch.start
+  invoke void @llvm.wasm.rethrow() #1 [ "funclet"(token %1) ]
+          to label %unreachable unwind label %catch.dispatch1
+
+catch.dispatch1:                                  ; preds = %rethrow, %catch.dispatch
+  %7 = catchswitch within none [label %catch.start2] unwind to caller
+
+catch.start2:                                     ; preds = %catch.dispatch1
+  %8 = catchpad within %7 [ptr @_ZTIi]
+  %9 = call ptr @llvm.wasm.get.exception(token %8)
+  %10 = call i32 @llvm.wasm.get.ehselector(token %8)
+  %11 = call i32 @llvm.eh.typeid.for(ptr @_ZTIi) #0
+  %matches3 = icmp eq i32 %10, %11
+  br i1 %matches3, label %catch5, label %rethrow4
+
+catch5:                                           ; preds = %catch.start2
+  %12 = call ptr @__cxa_begin_catch(ptr %9) #0 [ "funclet"(token %8) ]
+  %13 = load i32, ptr %12, align 4
+  call void @__cxa_end_catch() #0 [ "funclet"(token %8) ]
+  catchret from %8 to label %catchret.dest7
+
+catchret.dest7:                                   ; preds = %catch5
+  br label %try.cont8
+
+rethrow4:                                         ; preds = %catch.start2
+  call void @llvm.wasm.rethrow() #1 [ "funclet"(token %8) ]
+  unreachable
+
+try.cont8:                                        ; preds = %try.cont, %catchret.dest7
+  ret void
+
+try.cont:                                         ; preds = %catchret.dest
+  br label %try.cont8
+
+unreachable:                                      ; preds = %rethrow, %entry
+  unreachable
+}
+
+; Test for WebAssemblyException grouping. This test is hand-modified to generate
+; this structure:
+; catch.start dominates catch.start4 and catch.start4 dominates catch.start12,
+; so the after dominator-based grouping, we end up with:
+; catch.start's exception > catch4.start's exception > catch12.start's exception
+; (> here represents subexception relationship)
+;
+; But the unwind destination chain is catch.start -> catch.start4 ->
+; catch.start12. So all these subexception relationship should be deconstructed.
+; We have to make sure to take out catch.start4's exception out of catch.start's
+; exception first, before taking out catch.start12's exception out of
+; catch.start4's exception; otherwise we end up with an incorrect relationship
+; of catch.start's exception > catch.start12's exception.
+define void @exception_grouping_1() personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  invoke void @foo()
+          to label %invoke.cont unwind label %catch.dispatch
+
+invoke.cont:                                      ; preds = %entry
+  invoke void @foo()
+          to label %invoke.cont1 unwind label %catch.dispatch
+
+invoke.cont1:                                     ; preds = %invoke.cont
+  invoke void @foo()
+          to label %try.cont18 unwind label %catch.dispatch
+
+catch.dispatch11:                                 ; preds = %rethrow6, %catch.dispatch3
+  %0 = catchswitch within none [label %catch.start12] unwind to caller
+
+catch.start12:                                    ; preds = %catch.dispatch11
+  %1 = catchpad within %0 [ptr @_ZTIi]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  %4 = call i32 @llvm.eh.typeid.for(ptr @_ZTIi) #0
+  %matches13 = icmp eq i32 %3, %4
+  br i1 %matches13, label %catch15, label %rethrow14
+
+catch15:                                          ; preds = %catch.start12
+  %5 = call ptr @__cxa_begin_catch(ptr %2) #0 [ "funclet"(token %1) ]
+  %6 = load i32, ptr %5, align 4
+  call void @__cxa_end_catch() #0 [ "funclet"(token %1) ]
+  catchret from %1 to label %try.cont18
+
+rethrow14:                                        ; preds = %catch.start12
+  call void @llvm.wasm.rethrow() #1 [ "funclet"(token %1) ]
+  unreachable
+
+catch.dispatch3:                                  ; preds = %rethrow, %catch.dispatch
+  %7 = catchswitch within none [label %catch.start4] unwind label %catch.dispatch11
+
+catch.start4:                                     ; preds = %catch.dispatch3
+  %8 = catchpad within %7 [ptr @_ZTIi]
+  %9 = call ptr @llvm.wasm.get.exception(token %8)
+  %10 = call i32 @llvm.wasm.get.ehselector(token %8)
+  %11 = call i32 @llvm.eh.typeid.for(ptr @_ZTIi) #0
+  %matches5 = icmp eq i32 %10, %11
+  br i1 %matches5, label %catch7, label %rethrow6
+
+catch7:                                           ; preds = %catch.start4
+  %12 = call ptr @__cxa_begin_catch(ptr %9) #0 [ "funclet"(token %8) ]
+  %13 = load i32, ptr %12, align 4
+  call void @__cxa_end_catch() #0 [ "funclet"(token %8) ]
+  catchret from %8 to label %try.cont18
+
+rethrow6:                                         ; preds = %catch.start4
+  invoke void @llvm.wasm.rethrow() #1 [ "funclet"(token %8) ]
+          to label %unreachable unwind label %catch.dispatch11
+
+catch.dispatch:                                   ; preds = %invoke.cont1, %invoke.cont, %entry
+  %14 = catchswitch within none [label %catch.start] unwind label %catch.dispatch3
+
+catch.start:                                      ; preds = %catch.dispatch
+  %15 = catchpad within %14 [ptr @_ZTIi]
+  %16 = call ptr @llvm.wasm.get.exception(token %15)
+  %17 = call i32 @llvm.wasm.get.ehselector(token %15)
+  %18 = call i32 @llvm.eh.typeid.for(ptr @_ZTIi) #0
+  %matches = icmp eq i32 %17, %18
+  br i1 %matches, label %catch, label %rethrow
+
+catch:                                            ; preds = %catch.start
+  %19 = call ptr @__cxa_begin_catch(ptr %16) #0 [ "funclet"(token %15) ]
+  %20 = load i32, ptr %19, align 4
+  call void @__cxa_end_catch() #0 [ "funclet"(token %15) ]
+  catchret from %15 to label %try.cont18
+
+rethrow:                                          ; preds = %catch.start
+  invoke void @llvm.wasm.rethrow() #1 [ "funclet"(token %15) ]
+          to label %unreachable unwind label %catch.dispatch3
+
+try.cont18:                                       ; preds = %catch, %catch7, %catch15, %invoke.cont1
+  ret void
+
+unreachable:                                      ; preds = %rethrow, %rethrow6
+  unreachable
+}
+
+; void exception_grouping_2() {
+;   try {
+;     try {
+;       throw 0;
+;     } catch (int) { // (a)
+;     }
+;   } catch (int) {   // (b)
+;   }
+;   try {
+;     foo();
+;   } catch (int) {   // (c)
+;   }
+; }
+;
+; Regression test for an ExceptionInfo grouping bug. Because the first (inner)
+; try always throws, both EH pads (b) (catch.start2) and (c) (catch.start10) are
+; dominated by EH pad (a) (catch.start), even though they are not semantically
+; contained in (a)'s exception. Because (a)'s unwind destination is (b), (b)'s
+; exception is taken out of (a)'s. But because (c) is reachable from (b), we
+; should make sure to take out (c)'s exception out of (a)'s exception too.
+define void @exception_grouping_2() personality ptr @__gxx_wasm_personality_v0 {
+entry:
+  %exception = call ptr @__cxa_allocate_exception(i32 4) #1
+  store i32 0, ptr %exception, align 16
+  invoke void @__cxa_throw(ptr %exception, ptr @_ZTIi, ptr null) #3
+          to label %unreachable unwind label %catch.dispatch
+
+catch.dispatch:                                   ; preds = %entry
+  %0 = catchswitch within none [label %catch.start] unwind label %catch.dispatch1
+
+catch.start:                                      ; preds = %catch.dispatch
+  %1 = catchpad within %0 [ptr @_ZTIi]
+  %2 = call ptr @llvm.wasm.get.exception(token %1)
+  %3 = call i32 @llvm.wasm.get.ehselector(token %1)
+  %4 = call i32 @llvm.eh.typeid.for(ptr @_ZTIi) #1
+  %matches = icmp eq i32 %3, %4
+  br i1 %matches, label %catch, label %rethrow
+
+catch:                                            ; preds = %catch.start
+  %5 = call ptr @__cxa_begin_catch(ptr %2) #1 [ "funclet"(token %1) ]
+  %6 = load i32, ptr %5, align 4
+  call void @__cxa_end_catch() #1 [ "funclet"(token %1) ]
+  catchret from %1 to label %try.cont8
+
+rethrow:                                          ; preds = %catch.start
+  invoke void @llvm.wasm.rethrow() #3 [ "funclet"(token %1) ]
+          to label %unreachable unwind label %catch.dispatch1
+
+catch.dispatch1:                                  ; preds = %rethrow, %catch.dispatch
+  %7 = catchswitch within none [label %catch.start2] unwind to caller
+
+catch.start2:                                     ; preds = %catch.dispatch1
+  %8 = catchpad within %7 [ptr @_ZTIi]
+  %9 = call ptr @llvm.wasm.get.exception(token %8)
+  %10 = call i32 @llvm.wasm.get.ehselector(token %8)
+  %11 = call i32 @llvm.eh.typeid.for(ptr @_ZTIi) #1
+  %matches3 = icmp eq i32 %10, %11
+  br i1 %matches3, label %catch5, label %rethrow4
+
+catch5:                                           ; preds = %catch.start2
+  %12 = call ptr @__cxa_begin_catch(ptr %9) #1 [ "funclet"(token %8) ]
+  %13 = load i32, ptr %12, align 4
+  call void @__cxa_end_catch() #1 [ "funclet"(token %8) ]
+  catchret from %8 to label %try.cont8
+
+rethrow4:                                         ; preds = %catch.start2
+  call void @llvm.wasm.rethrow() #3 [ "funclet"(token %8) ]
+  unreachable
+
+try.cont8:                                        ; preds = %catch, %catch5
+  invoke void @foo()
+          to label %try.cont16 unwind label %catch.dispatch9
+
+catch.dispatch9:                                  ; preds = %try.cont8
+  %14 = catchswitch within none [label %catch.start10] unwind to caller
+
+catch.start10:                                    ; preds = %catch.dispatch9
+  %15 = catchpad within %14 [ptr @_ZTIi]
+  %16 = call ptr @llvm.wasm.get.exception(token %15)
+  %17 = call i32 @llvm.wasm.get.ehselector(token %15)
+  %18 = call i32 @llvm.eh.typeid.for(ptr @_ZTIi) #1
+  %matches11 = icmp eq i32 %17, %18
+  br i1 %matches11, label %catch13, label %rethrow12
+
+catch13:                                          ; preds = %catch.start10
+  %19 = call ptr @__cxa_begin_catch(ptr %16) #1 [ "funclet"(token %15) ]
+  %20 = load i32, ptr %19, align 4
+  call void @__cxa_end_catch() #1 [ "funclet"(token %15) ]
+  catchret from %15 to label %try.cont16
+
+rethrow12:                                        ; preds = %catch.start10
+  call void @llvm.wasm.rethrow() #3 [ "funclet"(token %15) ]
+  unreachable
+
+try.cont16:                                       ; preds = %try.cont8, %catch13
+  ret void
+
+unreachable:                                      ; preds = %rethrow, %entry
+  unreachable
+}
+
+; Check if the unwind destination mismatch stats are correct
+; NOSORT: 23 wasm-cfg-stackify    - Number of call unwind mismatches found
+; NOSORT:  4 wasm-cfg-stackify    - Number of catch unwind mismatches found
+
+declare void @foo()
+declare void @bar()
+declare i32 @baz()
+declare i32 @qux(i32)
+declare void @quux(i32)
+declare void @fun(i32)
+; Function Attrs: nounwind
+declare void @nothrow(i32) #0
+; Function Attrs: nounwind
+declare i32 @nothrow_i32() #0
+
+; Function Attrs: nounwind
+declare ptr @_ZN6ObjectD2Ev(ptr returned) #0
+ at _ZTI7MyClass = external constant { ptr, ptr }, align 4
+; Function Attrs: nounwind
+declare ptr @_ZN7MyClassD2Ev(ptr returned) #0
+; Function Attrs: nounwind
+declare ptr @_ZN7MyClassC2ERKS_(ptr returned, ptr dereferenceable(4)) #0
+
+declare i32 @__gxx_wasm_personality_v0(...)
+; Function Attrs: nounwind
+declare ptr @llvm.wasm.get.exception(token) #0
+; Function Attrs: nounwind
+declare i32 @llvm.wasm.get.ehselector(token) #0
+declare ptr @__cxa_allocate_exception(i32) #0
+declare void @__cxa_throw(ptr, ptr, ptr)
+; Function Attrs: noreturn
+declare void @llvm.wasm.rethrow() #1
+; Function Attrs: nounwind
+declare i32 @llvm.eh.typeid.for(ptr) #0
+
+declare ptr @__cxa_begin_catch(ptr)
+declare void @__cxa_end_catch()
+declare ptr @__cxa_get_exception_ptr(ptr)
+declare void @_ZSt9terminatev()
+; Function Attrs: nounwind
+declare void @llvm.memcpy.p0.p0.i32(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i32, i1 immarg) #0
+; Function Attrs: nounwind
+declare void @llvm.memmove.p0.p0.i32(ptr nocapture, ptr nocapture readonly, i32, i1 immarg) #0
+; Function Attrs: nounwind
+declare void @llvm.memset.p0.i32(ptr nocapture writeonly, i8, i32, i1 immarg) #0
+
+attributes #0 = { nounwind }
+attributes #1 = { noreturn }