[PATCH] D86805: [lld-macho] create __TEXT,__unwind_info from __LD,__compact_unwind
Greg McGary via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 9 17:55:32 PDT 2020
gkm marked 18 inline comments as done.
gkm added inline comments.
================
Comment at: lld/MachO/UnwindInfo.cpp:90-92
+ if (targetIsec->getVA() == 0) {
+ error(Twine("__compact_unwind relocation is unresolved against " +
+ targetIsec->segname + "," + targetIsec->name));
----------------
smeenai wrote:
> int3 wrote:
> > Is this check really necessary? What kind of errors are we defending against? I would rather we not loop over the relocations unless necessary (for performance)
> Yup. I don't have the full context here yet, but https://lld.llvm.org/NewLLD.html#numbers-you-want-to-know is relevant.
This is an assertion--a sanity check on my own assumptions. Once I tested more widely, I will have no further use for it. A `TODO` comment to convey my intention is appropriate.
================
Comment at: lld/MachO/UnwindInfo.cpp:91-92
+ if (targetIsec->getVA() == 0) {
+ error(Twine("__compact_unwind relocation is unresolved against " +
+ targetIsec->segname + "," + targetIsec->name));
+ }
----------------
int3 wrote:
> is the `Twine()` call here necessary? I would expect the concatenated StringRefs to already be a Twine
Regarding `Twine()`: I meant to only pass the initial string constant so that later `+` operands would be chained to it. I did not mean to pass entire concat expression. I used explicit `Twine()` following Saleem's example in an earlier diff. I'm happy to drop it. I see that diag functions already coerce string constants to `Twine` anyway.
================
Comment at: lld/MachO/UnwindInfo.cpp:95-96
+ } else {
+ error(Twine("__compact_unwind relocation is not section based: " +
+ targetIsec->segname + "," + targetIsec->name));
+ }
----------------
int3 wrote:
> CU relocations can be section-based. Just checked a simple program:
>
> ```
> ~/tmp: llvm-readobj --relocations --expand-relocs bar.o
>
> File: bar.o
> Format: Mach-O 64-bit x86-64
> Arch: x86_64
> AddressSize: 64bit
> Relocations [
> Section __compact_unwind {
> Relocation {
> Offset: 0x0
> PCRel: 0
> Length: 3
> Type: X86_64_RELOC_UNSIGNED (0)
> Section: __text (1)
> }
> }
> ]
> ~/tmp: cat bar.cpp
> int foo() {
> return 123;
> }
> ```
Yes. I expect relocs to all reference sections whose address is assigned, i.e., `__TEXT,__text`. The assertion checks if this is ever NOT the case.
================
Comment at: lld/MachO/UnwindInfo.cpp:167-168
+ std::lower_bound(
+ it0, itN, nullptr,
+ [functionAddressMax](const CompactUnwindEntry64 *a, ...) {
+ return a->functionAddress < functionAddressMax;
----------------
int3 wrote:
> why not pass `functionAddressMax` in as the 3rd parameter to `lower_bound`?
>
> edit: oh, because the return type isn't the container element type. would be good to have a comment :)
I added a comment explaining the choice.
================
Comment at: lld/MachO/UnwindInfo.h:50
+ std::map<uint32_t, size_t> commonEncodingIndexes;
+ std::vector<std::pair<compact_unwind_encoding_t, size_t>> commonEncodings;
+ std::vector<uint32_t> personalities;
----------------
int3 wrote:
> nit: I'd prefer structs over pairs for readable field names
`commonEncodings` works best as a vector of pairs because it is filled from a `DenseMap` iterator which returns precisely the pairs we desire.
================
Comment at: lld/MachO/Writer.cpp:413-414
+ .Case(segment_names::linkEdit, 100)
+ // __LD,__compact_unwind isn't an output segment, so shove it past the end
+ .Case(segment_names::ld, 101)
.Default(0);
----------------
int3 wrote:
> gkm wrote:
> > int3 wrote:
> > > This seems a bit hacky. Do we really have to put the `__compact_unwind` sections in an OutputSegment? If we're not going to output them, could we just put all these InputSections in a vector contained in `UnwindInfoSection`?
> > >
> > > Also, we may want to support emitting object files at some point via `-r`, in which case we'll actually want to emit the `__LD` segment at a valid position.
> > I agree that it could use some improvement. I will revisit.
> >
> > Behaviorally, `__LD,__compact_unwind` is a `MergedOutputSegment`, except with different timing and destination: we relocate & write it to a temp buffer from `Writer::assignAddresses()`) rather than writing to the final link output from `Writer::writeSections()`. We need to relocate it fully in order to compute the size of the `__TEXT,__unwind_info` section when laying-out sections & binding to addresses.
> >
> > You are quite right regarding `-r`, which makes `__LD,__compact_unwind` a completely normal `MergedOutputSegment`.
> I think it might be worthwhile to duplicate some of the code in `InputSection::writeTo` to specialize it for the compact unwind case. The current `writeTo` code assumes that both the source and target sections referenced by a relocation have their final output addresses determined. But this is not true for the compact unwind section -- it doesn't have a valid output address. It just happens to work because there are no pcrel relocations involved. We should really error out if we see a pcrel relocation while relocating `__compact_unwind`.
>
> Creating a separate compact unwind relocating method would free us from having to create `Output{Segments, Sections}` that we discard later. I'm hoping we can retain the invariant where we add OutputSections to OutputSegments only after we have determined that they are needed.
I still owe you an answer for this one ...
================
Comment at: lld/test/MachO/compact-unwind.test:3-5
+# FIXME(gkm): This test is fast on a Release tree, and slow (~10s) on
+# a Debug tree mostly because of llvm-mc. Is there a way to prefer the
+# fast installed llvm-mc rather than the slow one in our Debug tree?
----------------
int3 wrote:
> My understanding of the need for generate-cfi-funcs.py is that some test cases need to be very large in order to exercise the edge cases in the implementation. In addition to providing that functionality, this script also serves as a fuzzer by generating random test cases. Would it be possible for the unit test to instead enumerate a small number of cases that cover all important code paths, in order that we run fewer test cases?
It is already a single test case, made deterministic by passing `--seed' so that `check-lld-macho` runs are consistent. The multiplicity that might be reduced is the number of functions generated for that one test case. However, since I need to test boundary conditions around breaking 1021-entry pages, my options are limited. If there is a way to prefer an installed `llvm-mc` over the locally built one, that would solve it.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D86805/new/
https://reviews.llvm.org/D86805
More information about the llvm-commits
mailing list