[PATCH] D86805: [lld-macho] create TEXT,unwind_info from LD,compact_unwind

Tue Sep 8 10:56:43 PDT 2020

int3 added a comment.

Having things in `finalize()` indeed looks a lot cleaner :D

================
Comment at: lld/MachO/SyntheticSections.h:22
 #include "llvm/Support/raw_ostream.h"
+#include <map>

----------------
this include doesn't seem necessary

================
Comment at: lld/MachO/UnwindInfo.cpp:91-92
+        if (targetIsec->getVA() == 0) {
+          error(Twine("__compact_unwind relocation is unresolved against " +
+                      targetIsec->segname + "," + targetIsec->name));
+        }
----------------
is the `Twine()` call here necessary? I would expect the concatenated StringRefs to already be a Twine

================
Comment at: lld/MachO/UnwindInfo.cpp:107
+  cuPtrVector.reserve(cuCount);
+  std::map<compact_unwind_encoding_t, size_t> encodingFrequencies;
+  for (size_t i = 0; i < cuCount;) {
----------------
ditto, avoid `<map>`

================
Comment at: lld/MachO/UnwindInfo.cpp:120
+  // make a table of common encodings, and sort by descending frequency
+  for (const auto &frequency : encodingFrequencies) {
+    commonEncodings.emplace_back(frequency);
----------------
codebase convention is to avoid curly braces for one-liner blocks

================
Comment at: lld/MachO/UnwindInfo.cpp:127
+              if (a.second == b.second)
+                // When freqnecies match, secondarily sort on encoding
+                // to maintain parity with validate-unwind-info.py
----------------
frequencies

================
Comment at: lld/MachO/UnwindInfo.cpp:147
+  // Record the page splits as a vector of iterators on cuPtrVector
+  // such that succesive elements form a semi-open interval. E.g.,
+  // page X's bounds are thus: [ pageBounds[X] .. pageBounds[X+1] )
----------------
successive

================
Comment at: lld/MachO/UnwindInfo.cpp:150
+  //
+  // Note that pageBouinds.size() is one greater than the number of
+  // pages, where pageBounds.cend() holds the sentinel cuPtrVector.cend()
----------------
pageBounds

================
Comment at: lld/MachO/UnwindInfo.cpp:155
+  for (size_t i = 0;;) {
+    // Limit the search to entries that can fit witin a 4 KiB page.
+    std::vector<const CompactUnwindEntry64 *>::const_iterator it0 =
----------------
within

================
Comment at: lld/MachO/UnwindInfo.cpp:156-158
+    std::vector<const CompactUnwindEntry64 *>::const_iterator it0 =
+        pageBounds[0] + i;
+    std::vector<const CompactUnwindEntry64 *>::const_iterator itN =
----------------
nit 1: the type here is pretty verbose, a type alias would be nice (`auto` would be acceptable here too I think)

nit 2: `it0` and `itN` aren't the most descriptive names... something like `intervalStart` and `intervalMax` would be nice

================
Comment at: lld/MachO/UnwindInfo.cpp:167-168
+        std::lower_bound(
+            it0, itN, nullptr,
+            [functionAddressMax](const CompactUnwindEntry64 *a, ...) {
+              return a->functionAddress < functionAddressMax;
----------------
why not pass `functionAddressMax` in as the 3rd parameter to `lower_bound`?

edit: oh, because the return type isn't the container element type. would be good to have a comment :)

================
Comment at: lld/MachO/UnwindInfo.cpp:193-196
+  // relocate __compact_unwind entries into cuBuf
+  if (getSize() == 0)
+    return;
+
----------------
can be rm'ed

================
Comment at: lld/MachO/UnwindInfo.cpp:211
+  // Common encodings
+  auto *i32p = reinterpret_cast<uint32_t *>(&uip[1]);
+  for (const auto &encoding : commonEncodings) {
----------------
would be good to have a comment to the effect that there are a bunch of integer arrays immediately after the section header (I had to pause and think about what `&uip[1]` meant)

================
Comment at: lld/MachO/UnwindInfo.cpp:257
+  for (size_t i = 0; i < pageBounds.size() - 1; i++) {
+    p2p->kind = 3;
+    p2p->entryPageOffset =
----------------
what does a `kind` of 3 indicate?

================
Comment at: lld/MachO/UnwindInfo.h:49
+private:
+  std::map<uint32_t, size_t> commonEncodingIndexes;
+  std::vector<std::pair<compact_unwind_encoding_t, size_t>> commonEncodings;
----------------
try to avoid `<map>` if possible: https://llvm.org/docs/ProgrammersManual.html#map

also the key here should be `compact_unwind_encoding_t` to be consistent :)

================
Comment at: lld/MachO/UnwindInfo.h:50
+  std::map<uint32_t, size_t> commonEncodingIndexes;
+  std::vector<std::pair<compact_unwind_encoding_t, size_t>> commonEncodings;
+  std::vector<uint32_t> personalities;
----------------
nit: I'd prefer structs over pairs for readable field names

================
Comment at: lld/MachO/Writer.cpp:413-414
+      .Case(segment_names::linkEdit, 100)
+      // __LD,__compact_unwind isn't an output segment, so shove it past the end
+      .Case(segment_names::ld, 101)
       .Default(0);
----------------
gkm wrote:
> int3 wrote:
> > This seems a bit hacky. Do we really have to put the `__compact_unwind` sections in an OutputSegment? If we're not going to output them, could we just put all these InputSections in a vector contained in `UnwindInfoSection`? 
> > 
> > Also, we may want to support emitting object files at some point via `-r`, in which case we'll actually want to emit the `__LD` segment at a valid position.
> I agree that it could use some improvement. I will revisit.
> 
> Behaviorally, `__LD,__compact_unwind` is a `MergedOutputSegment`, except with different timing and destination: we relocate & write it to a temp buffer from `Writer::assignAddresses()`) rather than writing to the final link output from `Writer::writeSections()`. We need to relocate it fully in order to compute the size of the `__TEXT,__unwind_info` section when laying-out sections & binding to addresses.
> 
> You are quite right regarding `-r`, which makes `__LD,__compact_unwind` a completely normal `MergedOutputSegment`.
I think it might be worthwhile to duplicate some of the code in `InputSection::writeTo` to specialize it for the compact unwind case. The current `writeTo` code assumes that both the source and target sections referenced by a relocation have their final output addresses determined. But this is not true for the compact unwind section -- it doesn't have a valid output address. It just happens to work because there are no pcrel relocations involved. We should really error out if we see a pcrel relocation while relocating `__compact_unwind`.

Creating a separate compact unwind relocating method would free us from having to create `Output{Segments, Sections}` that we discard later. I'm hoping we can retain the invariant where we add OutputSections to OutputSegments only after we have determined that they are needed.

================
Comment at: lld/test/MachO/compact-unwind.test:3-5
+# FIXME(gkm): This test is fast on a Release tree, and slow (~10s) on
+# a Debug tree mostly because of llvm-mc. Is there a way to prefer the
+# fast installed llvm-mc rather than the slow one in our Debug tree?
----------------
My understanding of the need for generate-cfi-funcs.py is that some test cases need to be very large in order to exercise the edge cases in the implementation. In addition to providing that functionality, this script also serves as a fuzzer by generating random test cases. Would it be possible for the unit test to instead enumerate a small number of cases that cover all important code paths, in order that we run fewer test cases?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86805/new/

https://reviews.llvm.org/D86805

[PATCH] D86805: [lld-macho] create __TEXT,__unwind_info from __LD,__compact_unwind

[PATCH] D86805: [lld-macho] create TEXT,unwind_info from LD,compact_unwind