[PATCH] D86805: [lld-macho] create TEXT,unwind_info from LD,compact_unwind

Thu Sep 3 14:57:47 PDT 2020

int3 added inline comments.

================
Comment at: lld/MachO/UnwindInfo.h:51-66
+  mutable std::map<uint32_t, size_t> commonEncodingIndexes;
+  mutable std::vector<std::pair<compact_unwind_encoding_t, size_t>>
+      commonEncodings;
+  mutable std::vector<uint32_t> personalities;
+  mutable std::vector<unwind_info_section_header_lsda_entry> lsdaEntries;
+  mutable std::vector<compact_unwind_entry> cuVector;
+  mutable std::vector<compact_unwind_entry *> cuPtrVector;
----------------
gkm wrote:
> gkm wrote:
> > int3 wrote:
> > > int3 wrote:
> > > > gkm wrote:
> > > > > int3 wrote:
> > > > > > why the need to mark everything as mutable?
> > > > > The work of morphing `__LD,__compact_unwind` into `__TEXT,__unwind_info` happens in `uint64_t getSize() const` and `void writeTo(uint8_t *buf) const`, which can only be logically `const`, but not physically.
> > > > I would much prefer if we could have a method that explicitly computes the unwind info... but first let me see if I understand the constraints here.
> > > > 
> > > > From what I understand, the size of the unwind info depends on the values of `CompactUnwindEntry64::functionAddress`, or more precisely, their relative offsets. So we need to assign addresses to our functions before computing the unwind info size. After addresses are assigned, the `cuSection->writeTo()` call below will update the `functionAddress` entries to point to the final addresses, after which we can compute the needed size. Does all that sound right?
> > > > 
> > > > From what I can tell, ld64 calculates unwind info size based on "tentative addresses", which I think have the right relative offsets, though their final values have yet to be fixed. I wonder if we could do something similar with our current design...
> > > Also -- do you know if it's possible for unwind info to point to functions that reside in sections other than `__text`?
> > Yes, that's substantially correct, and I will elaborate:
> > 
> > `UnwindInfoSection::getSize()` must process the compact unwind entries and partition them into 4 KiB second-level pages in order to determine the final size of the `__TEXT,__unwind_info`. There are three factors that influence layout & therefore size:
> >   # A maximum of 1021 entries can fit into a 4 KiB page
> >   # The difference between address of first and last functions in a second-level page must be less than 2^24. The page will be cut short and remain under-filled when a candidate entry's 24-bit relative offset field would overflow.
> >   # We do a linear scan and fold adjacent entries that have the same encoding, thus laying-out fewer entries in the output than are present in the input.
> > 
> > As currently implemented, I use `mutable` data members to cache the necessary and substantial work done in `UnwindInfoSection::getSize() const` so that `UnwindInfoSection::writeTo() const` need not redo it.
> > 
> > I believe it is not strictly necessary to relocate the compact unwind entries in `getSize()`. I can probably work with the raw input sections for page partitioning and entry folding, though the code will be trickier, and I won't escape the need to relocate later in `writeTo()` for sake of LSDA and personality addresses.
> > 
> > I chose to use data members as cache, to relocate early, and to use `mutable` to work-around constness of `getSize()` and `writeTo()` because that results in code that does things in the proper sequence, and in the most straightforward manner for sake of clarity and efficiency.
> With assembler language, anything is possible. I assume one can place compiled functions in alternate sections via attributes or pragmas. Those sections might be discontiguous with `__text`, and therefore require lay-out in to distinct second-level pages. I don't know if ld64 supports non-`__text` compact unwind. I will need to conjure some test cases to see how it behaves. If ld64 doesn't handle non-`__text`, then we won't either. If it does, then I will add that later.
> I believe it is not strictly necessary to relocate the compact unwind entries in getSize(). I can probably work with the raw input sections for page partitioning and entry folding

How about doing most of what's currently in `getSize()` in a `finalizeContents()` method instead, much like what is currently being done for the the `__LINKEDIT` sections? My main concern is -- aside from a mutating method being marked as `const` -- is that compact unwind's `getSize()` has an implicit dependency on the addresses of another section. Other synthetic sections' `finalizeContent` methods have such a dependency, too, so this would fit the mold.

Of course, unlike the `__LINKEDIT` sections, the address assignments of later sections do depend on compact unwind's size, so we'll have to call it in the middle of `assignAddresses`. Such special-casing of CW would be a bit more verbose than what we currently have, but I think it's a good thing to make the uniqueness of compact unwind's requirements explicit.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86805/new/

https://reviews.llvm.org/D86805

[PATCH] D86805: [lld-macho] create __TEXT,__unwind_info from __LD,__compact_unwind

[PATCH] D86805: [lld-macho] create TEXT,unwind_info from LD,compact_unwind