[PATCH] D130049: [WinSEH][ARM64] Split unwind info for functions larger than 1MB

Thu Jul 21 14:36:44 PDT 2022

mstorsjo added a comment.

Overall this looks quite good to me, I mostly have a bunch of readability/understandability comments.

================
Comment at: llvm/include/llvm/MC/MCWinEH.h:72
+    MCSymbol *Symbol;
+    MapVector<MCSymbol *, int64_t> Epilogs;
+
----------------
Can you add a comment about what the `int64_t` is here? It's not obvious to me on the first read through the code.

================
Comment at: llvm/lib/MC/MCWin64EH.cpp:1006
+
+    while (RemainingLength >= SegLimit) {
+      // Try divide the function into segments, requirements:
----------------
Should this be `>`? Isn't it theoretically possible here that we have `RemainingLength == SegLimit`, we produce one segment and consume all of `RemainingLength`, and we'd end up adding a zero length segment at the end?

================
Comment at: llvm/lib/MC/MCWin64EH.cpp:1012
+      int64_t SegEnd = SegOffset + SegLength;
+      EpilogsInSegment.clear();
+
----------------
To me, it would be clearer if we'd move the declaration of `EpilogsInSegment` here, so we don't need to clear it, but we're operating on a new instance of it each time. Or are there (tangible?) performance benefits to doing it like this?

================
Comment at: llvm/lib/MC/MCWin64EH.cpp:1014
+
+      while (E < Epilogs.size() && Epilogs[E].End < SegEnd) {
+        // Epilogs within current segment.
----------------
Can we assume that all the epilogs are monotonically ordered? (I think it can be reasonable to do that, but maybe it'd be good to have an assert somewhere that tests it?)

================
Comment at: llvm/lib/MC/MCWin64EH.cpp:1021
+      // We either: 1. put all epilogs in segments already; Or
+      //            2. We found one that will cross segments boundry.
+      if (E < Epilogs.size() && Epilogs[E].Offset <= SegEnd)
----------------
By point 1., I presume you mean "we included all completely contained epilogs, and found another one which is entirely after the current segment" - or would you consider that a third case? It feels confusing to not have that case listed in any case (even if the code probably is correct).

================
Comment at: llvm/lib/MC/MCWin64EH.cpp:1027
+          SegOffset, SegLength, /* HasProlog */!SegOffset);
+      Seg.Epilogs = std::move(EpilogsInSegment);
+      info->Segments.push_back(Seg);
----------------
If you don't want to move the declaration of `EpilogsInSegment` to line 1012 above, it would be clearer to me to do the `.clear()` here, directly after the `std::move`, even if that does one unnecessary clean for the last iteration.

================
Comment at: llvm/lib/MC/MCWin64EH.cpp:1038
+  auto LastSeg =
+      WinEH::FrameInfo::Segment(SegOffset, RawFuncLength - SegOffset, !SegOffset);
+  for (; E < Epilogs.size(); ++E)
----------------
Please add the `/*HasProlog*/ comment here for the third parameter.

================
Comment at: llvm/lib/MC/MCWin64EH.cpp:1131
     if (CodeWords > 0xFF || EpilogCount > 0xFFFF)
-      report_fatal_error("SEH unwind data splitting not yet implemented");
+      report_fatal_error("SEH unwind data splitting not yet implemented,"
+                         "code words too large or too many epilogs");
----------------
Isn't `SEH unwind data splitting` kinda what's being implemented here? It's just that no split is being done for the condition of too many opcode bytes, only for the condition of too large functions. I.e. I'd like to rephrase the error message a bit more.

================
Comment at: llvm/lib/MC/MCWin64EH.cpp:1159
   // Emit prolog unwind instructions (in reverse order).
-  uint8_t numInst = info->Instructions.size();
-  for (uint8_t c = 0; c < numInst; ++c) {
-    WinEH::Instruction inst = info->Instructions.back();
-    info->Instructions.pop_back();
-    ARM64EmitUnwindCode(streamer, inst);
-  }
+  for (auto Inst : llvm::reverse(info->Instructions))
+    ARM64EmitUnwindCode(streamer, Inst);
----------------
I'd like to have a more verbose comment here to remind myself about how prologs in `!HasProlog` segments work - I always end up confused about this, e.g. something like this:

```
// Even for a segment with !HasProlog, we do need to emit the opcodes for the prolog, for the unwinder to use for unwinding from this segment.
// The end_c opcode at the start indicates to the unwinder that the actual prolog is outside of the current segment, and the unwinder shouldn't try to check for unwinding from a partial prolog.
```

================
Comment at: llvm/test/MC/AArch64/seh-large-func.s:9
+// CHECK-NEXT:     Number: 1
+// CHECK-NEXT:     Name: .text (2E 74 65 78 74 00 00 00)
+// CHECK-NEXT:     VirtualSize: 0x0
----------------
Is it meaningful to include all the section headers in the testcase here? They're mostly distracting here I think, and just extra work to update when working on the testcase in my experience.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130049/new/

https://reviews.llvm.org/D130049