[PATCH] D76839: [lld-macho] Extend SyntheticSections to cover all segment load commands

Thu Apr 9 15:48:49 PDT 2020

int3 added a comment.

Sorry about the delay, got sidetracked for a bit...

================
Comment at: lld/MachO/OutputSegment.cpp:40
+  if (vec.empty() && !isec->isHidden()) {
+    ++numNonHiddenSections;
+  }
----------------
smeenai wrote:
> Hmm, what's up with the `vec.empty()` condition? Doesn't that mean `numNonHiddenSections` will only ever be 0 or 1, depending on if the first section `addSection` was called on for a particular segment was hidden or not?
This is to handle merged sections (well, aside from the fact that the rest of the code doesn't really do that yet). `vec.empty()` checks to see if we're creating a new output section or adding to an existing one.

================
Comment at: lld/MachO/OutputSegment.h:35
+  bool isNeeded() const {
+    return sections.size() != 0 || name == segment_names::linkEdit;
+  }
----------------
smeenai wrote:
> ld64 commonly checks segment or section names and takes particular actions based on that name. I'm not a fan of that design, personally, since it ties a lot of logic to those names. Having named constants for the names is a big improvement already, but I'm wondering if there's any way to design this that would sidestep the issue altogether.
> 
> For example, we could have a property of an OutputSegment like `alwaysNeeded`, which would be set for the linkEdit segment at its point of creation. Alternatively, we could have a LinkEditSegment override of OutputSegment which overrides `isNeeded`. The point would be to try to represent properties of the segment as attributes or subclasses, instead of computing them based on the name.
> 
> I'm not sure how that'd end up looking in practice though, or if it's worth it in all cases. @int3, @ruiu, what do you think?
Personally I try to avoid abstractions until they're definitively needed. At this point I'm not aware of any other segment that will benefit from `alwaysNeeded`

================
Comment at: lld/MachO/SyntheticSections.cpp:157
+uint32_t StringPoolSection::addString(StringRef str) {
+  uint32_t strx = poolSize;
+  pool.push_back(str);
----------------
smeenai wrote:
> `index` might be clearer than `strx`.
it's named this way because its value ends up populating `n_list::n_strx`

================
Comment at: lld/MachO/SyntheticSections.h:35
+// The header of the Mach-O file, which must have a file offset of zero.
+class MachHeaderSection : public InputSection {
+public:
----------------
smeenai wrote:
> Shouldn't all of these be inheriting from SyntheticSection?
I haven't defined such a class yet, mostly because I'm not yet sure what it should contain that `InputSection` doesn't. I'm thinking we can sort out the class hierarchy after we have a proper OutputSection class set up.

================
Comment at: lld/MachO/SyntheticSections.h:116
+// Stores the strings referenced by the symbol table.
+class StringPoolSection : public InputSection {
+public:
----------------
smeenai wrote:
> How come you're calling this the StringPoolSection instead of the StringTableSection? I thought string table was the standard terminology.
ld64 calls it the string pool ¯\_(ツ)_/¯ I'm fine with either though

================
Comment at: lld/MachO/Writer.cpp:259-261
+    return linkEditOffset + 1;
+  else
+    return linkEditOffset;
----------------
smeenai wrote:
> ruiu wrote:
> > int3 wrote:
> > > ruiu wrote:
> > > > So it looks like the function returns only 4 possible values -- 0, 1, linkEditOffset, or linkEditOffset+1. Can't you use 0, 1, 2, 3 instead?
> > > The idea is to make it easily extensible -- there may be other special sections that we want to order before the __LINKEDIT sections.
> > Got it, but I don't think you have to prepare this tiny function for future extension. You can just make it return 0,1,2,3 and then in a later change when you actually need to use linkEditOffset, you can start returning linkEditOffset and linkEditOffset+1.
> Looks like this comment still needs a response.
> You can just make it return 0,1,2,3 and then in a later change when you actually need to use linkEditOffset, you can start returning linkEditOffset and linkEditOffset+1.

I'd actually forgotten to add the symbol table / string pool sections to the ordering after rebasing, oops. So we do actually have a bunch of __LINKEDIT sections to order in this diff already.

In any case, I've refactored things so that the ordering can be specified a bit more declaratively, though it does add a bit more code... let me know what you think, I can revert to the if-else chain if you'd prefer.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76839/new/

https://reviews.llvm.org/D76839