[PATCH] D77893: [lld] Merge Mach-O input sections

Shoaib Meenai via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 30 19:00:19 PDT 2020


smeenai added a comment.

One thing I just thought about (sorry :/). How do we want to handle sections in input files with the same name as our special sections? For example, what if the user gives us an input file with the section `__DATA_CONST,__got`? ld64 appears to handle this fine; it just combines the user-provided section with its own synthesized one. It also handles the case where a user input file has a `__TEXT,__mach_header` section, and treats it as distinct from its own hidden synthesized section with that name.



================
Comment at: lld/MachO/MergedOutputSection.h:19
+
+class MergedOutputSection : public OutputSection {
+public:
----------------
int3 wrote:
> smeenai wrote:
> > Ktwu wrote:
> > > Ktwu wrote:
> > > > int3 wrote:
> > > > > smeenai wrote:
> > > > > > Could you add a class comment about what this represents?
> > > > > > 
> > > > > > I'd prefer renaming `OutputSection` to `OutputSectionBase` and `MergedOutputSection` to just `OutputSection`, to be more in line with ELF, but I don't feel super strongly about that.
> > > > > +1 on the renaming
> > > > Sure.
> > > Er, actually, I like MergedOutputSection because it conveys more information about what kind of OutputSection it is. Given how many of the other classes used don't use -Base in their name -- InputSections, OutputSegments -- it feels clunky to have an OutputSectionBase.
> > > 
> > > @int3 do you feel strongly about renaming this? 
> > Sure, I'm good with the parent class being named `OutputSection` and having specific subclasses representing the different types of output sections. I'm not the biggest fan of the `MergedOutputSection` name because to me it suggests output sections being merged together rather than input sections being merged into a single output section, but I can't think of anything better either, so I'm good with it if we can't come up with a better name.
> > 
> > Naming is hard :D
> I prefer the rename because it more closely parallels what lld-ELF has. Moreover ELF's MergeInputSection is quite different, so it's almost like a false parallel...
> 
> Re conveying information, my mental model has been that OutputSections are by default mergeable, and only the special SyntheticSections aren't, so I don't feel that there's a need to explicitly call out the mergeability.
> 
> `-Base` being a clunky, non-functionally-descriptive suffix is a fair point though...
To be fair, I don't think LLD ELF has an OutputSectionBase. It does have an InputSectionBase though.

The hierarchy is also different cos ELF synthetic sections are input sections (so they can be manipulated by linker scripts), whereas ours are output sections, so it's a bit hard to compare.


================
Comment at: lld/MachO/MergedOutputSection.h:19
+
+/**
+ * Linking multiple files will inevitably mean resolving sections in different
----------------
The comment looks great!

Super nit: LLD always formats these types of comments using single-line comments (`//`) instead of `/* */`, so we should follow suit here.


================
Comment at: lld/MachO/OutputSection.cpp:22
+void OutputSection::mergeInput(InputSection *input) {
+  error("Section " + name + " in segment " + parent->name +
+        " cannot be merged");
----------------
We should only hit this if we have a programming error on our end, right? If so, this should be `llvm_unreachable` instead of `error`.


================
Comment at: lld/MachO/OutputSection.h:21
+
+/**
+ * Output sections represent the finalized sections present within the final
----------------
Comment looks great! Same nit about using single-line comments.


================
Comment at: lld/MachO/OutputSegment.cpp:38
 
-void OutputSegment::addSection(InputSection *isec) {
-  isec->parent = this;
-  std::vector<InputSection *> &vec = sections[isec->name];
-  if (vec.empty() && !isec->isHidden()) {
-    ++numNonHiddenSections;
+size_t OutputSegment::numNonHiddenSections() const {
+  size_t count = 0;
----------------
Ktwu wrote:
> smeenai wrote:
> > If this is gonna be called often, we should either do some sort of caching, or else just do the computation as part of `addOutputSection` and make it a public member variable.
> I believe I hit issues trying to cache this on adding an OutputSection; I think the GotSection made things difficult since its isNeeded() attribute is dynamic.
Ah, makes sense.


================
Comment at: lld/MachO/OutputSegment.cpp:49
+  os->parent = this;
+  sections[os->name] = os;
+}
----------------
Should we assert that `sections[os->name]` doesn't already exist?


================
Comment at: lld/MachO/OutputSegment.cpp:53
+OutputSection *OutputSegment::getOrCreateOutputSection(StringRef name) {
+  OutputSegment::SectionMap::iterator i = sections.find(name);
+  if (i != sections.end()) {
----------------
FWIW, `auto` is fine for iterators, but this is fine too.


================
Comment at: lld/MachO/OutputSegment.h:73
+    if (it == orderMap.end()) {
+      return &orderMap.find(defaultPosition)->second;
+    }
----------------
Ktwu wrote:
> smeenai wrote:
> > Given that this might be the common case, would it make sense to cache the `find` somehow?
> Good idea!
This one isn't addressed, but given that we shouldn't have too many segments (besides the ones already in the map, I can only think of `__DATA`), perhaps this is okay as-is?


================
Comment at: lld/MachO/SyntheticSections.cpp:32
+  // them up when they're made
+  this->name = name;
+  getOrCreateOutputSegment(segname)->addOutputSection(this);
----------------
Super nit: use a member initialization list instead.


================
Comment at: lld/MachO/SyntheticSections.cpp:70
 
-GotSection::GotSection() {
-  segname = "__DATA_CONST";
-  name = "__got";
+GotSection::GotSection() : SyntheticSection("__DATA_CONST", "__got") {
   align = 8;
----------------
@int3 how come these strings are just here directly vs. all the other ones being named constants?

Also, not this diff, but is `__DATA_CONST` correct? ld64 puts this in `__DATA` instead as far as I can see. It's not quite constant cos the dynamic linker's gonna fill it in, though idk if it has the equivalent to ELF's RELRO.


================
Comment at: lld/test/MachO/section-merge.s:15
+# CHECK:          Name: _goodbye_world
+# CHECK-NEXT:     Extern
+# CHECK-NEXT:     Type: Section (0xE)
----------------
We only need to check the properties we care about. Detailed symbol table checking should happen in the symbol table tests. Over here, I think we just care about the symbol name, section, and value, so we can drop the checks for the other fields (Extern, Type, RefType, Flags)

We should also be checking for the text section symbols.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77893/new/

https://reviews.llvm.org/D77893





More information about the llvm-commits mailing list