[PATCH] D77893: [lld] Merge Mach-O input sections

Shoaib Meenai via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 24 14:05:49 PDT 2020


smeenai added a comment.

The change looks good, but it needs a bunch more testing, e.g:

- Various flag merging scenarios (I recognize some of that is in flux right now, but e.g. the pure instructions mergnig)
- Checking for merging text and data sections as well as cstrings, including checking the merging of the contents as well as the symbols
- Perhaps something for the section sorting, since you're changing that (though perhaps the existing tests for that already provide enough coverage)



================
Comment at: lld/MachO/ExportTrie.h:27
   size_t build();
-  void writeTo(uint8_t *buf);
+  void writeTo(uint8_t *buf) const;
 
----------------
@Ktwu, if these changes were incorporated in D76977, would you not need to depend on that? That one's a meatier review than this one, so it'd be easier to get this in first if possible.


================
Comment at: lld/MachO/MergedOutputSection.cpp:33
+void MergedOutputSection::finalize() {
+  uint64_t iaddr = addr;
+  fileSize = 0;
----------------
Super nit: `isecAddr` is a bit more understandable IMO.


================
Comment at: lld/MachO/MergedOutputSection.cpp:36
+
+  for (InputSection *i : inputs) {
+    i->outSecOff = alignTo(iaddr, i->align) - addr;
----------------
Super nit: `InputSection` loop variables are usually `isec`


================
Comment at: lld/MachO/MergedOutputSection.cpp:51
+
+// TODO: this is most likely wrong; reconsider how section flags
+// are actually merged.
----------------
Could you elaborate on what might be wrong in this comment?


================
Comment at: lld/MachO/MergedOutputSection.cpp:65
+
+  // Negate pure instruction presence if any segment isn't pure.
+  uint32_t pureMask = ~(MachO::S_ATTR_PURE_INSTRUCTIONS & inputFlags & flags);
----------------
segment -> section


================
Comment at: lld/MachO/MergedOutputSection.cpp:66
+  // Negate pure instruction presence if any segment isn't pure.
+  uint32_t pureMask = ~(MachO::S_ATTR_PURE_INSTRUCTIONS & inputFlags & flags);
+
----------------
If I'm understanding this correctly, it'll end up negating the pure bit when you don't want it to. If both `inputFlags` and `flags` have the bit set, the result of the AND for that bit will be 1, so it'll be 0 in your mask, so it'll get unset.


================
Comment at: lld/MachO/MergedOutputSection.h:19
+
+class MergedOutputSection : public OutputSection {
+public:
----------------
Could you add a class comment about what this represents?

I'd prefer renaming `OutputSection` to `OutputSectionBase` and `MergedOutputSection` to just `OutputSection`, to be more in line with ELF, but I don't feel super strongly about that.


================
Comment at: lld/MachO/OutputSection.h:20
+
+class OutputSection {
+public:
----------------
Ktwu wrote:
> int3 wrote:
> > How does this class hierarchy compare to that in the ELF implementation? I know they have `SectionBase`, `InputSectionBase`, and `MergeInputSection`... was planning to dig into it eventually but curious if you have looked
> I hadn't looked too closely tbh.
`MergeInputSection` is for the ELF SHF_MERGE concept, where a linker can merge the objects inside a section ... it's used for strings, for example (where the linker can perform string merging and tail merging). I think ld64 just treats certain sections types as mergeable instead of having a special flags for that.


================
Comment at: lld/MachO/OutputSection.h:21
+
+class OutputSection {
+public:
----------------
Can you add a class comment?


================
Comment at: lld/MachO/OutputSection.h:26-27
+  // These accessors will only be valid after finalizing the section.
+  virtual uint64_t getFileOffset() const;
+  virtual uint64_t getSegmentOffset() const;
+
----------------
Why do these need to be virtual?


================
Comment at: lld/MachO/OutputSection.h:41
+  // Some sections may allow coalescing other raw input sections.
+  virtual void mergeInput(InputSection *input);
+
----------------
Is this needed for anything other than MergedOutputSection?


================
Comment at: lld/MachO/OutputSegment.cpp:38
 
-void OutputSegment::addSection(InputSection *isec) {
-  isec->parent = this;
-  std::vector<InputSection *> &vec = sections[isec->name];
-  if (vec.empty() && !isec->isHidden()) {
-    ++numNonHiddenSections;
+size_t OutputSegment::numNonHiddenSections() const {
+  size_t count = 0;
----------------
If this is gonna be called often, we should either do some sort of caching, or else just do the computation as part of `addOutputSection` and make it a public member variable.


================
Comment at: lld/MachO/OutputSegment.cpp:40
+  size_t count = 0;
+  for (auto i : sections) {
+    count += (i.second->isHidden() ? 0 : 1);
----------------
`auto &i`, and same comment about unpacking `i.second`


================
Comment at: lld/MachO/OutputSegment.h:32
 public:
-  InputSection *firstSection() const { return sections.front().second.at(0); }
+  typedef llvm::MapVector<StringRef, OutputSection *> SectionMap;
 
----------------
int3 wrote:
> nit: I think `using SectionMap = ...` is the more modern C++ way (and is the method favored by lld-ELF/COFF)
This one needs to be addressed.


================
Comment at: lld/MachO/OutputSegment.h:73
+    if (it == orderMap.end()) {
+      return &orderMap.find(defaultPosition)->second;
+    }
----------------
Given that this might be the common case, would it make sense to cache the `find` somehow?


================
Comment at: lld/MachO/SyntheticSections.h:165-169
 struct InStruct {
   GotSection *got = nullptr;
 };
 
 extern InStruct in;
----------------
I believe the `in` is short for "globally accessibly input sections". LLD ELF has a corresponding struct called `Out` for output sections (in OutputSections.h), and given that our synthetic sections are output sections now, it might make sense to adopt that name.


================
Comment at: lld/MachO/Writer.cpp:328-330
+    getOrCreateOutputSegment(isec->segname)
+        ->getOrCreateOutputSection(isec->name)
+        ->mergeInput(isec);
----------------
Nit: might be nicer to assign these to variables instead of chaining.


================
Comment at: lld/MachO/Writer.cpp:376
   // dyld requires __LINKEDIT segment to always exist (even if empty).
-  getOrCreateOutputSegment(segment_names::linkEdit);
-  // No more segments can be created after this method runs.
+  auto *linkEditSegment = getOrCreateOutputSegment(segment_names::linkEdit);
+
----------------
Nit: use an explicit type instead of auto


================
Comment at: lld/test/MachO/section-merge.s:3
+# RUN: mkdir -p %t
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %p/Inputs/libhello.s \
+# RUN:   -o %t/libhello.o
----------------
It'd be nice to check that the contents of the sections are merged correctly as well. Also, we should be checking that text segments are merged correctly, since that's the most common case.


================
Comment at: lld/test/MachO/section-merge.s:31-32
+.section __TEXT,__text
+.global _goodbye_world
+.global _hello_world
+.global _main
----------------
We aren't defining these symbols in this file, so the `.global` directives aren't doing anything.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77893/new/

https://reviews.llvm.org/D77893





More information about the llvm-commits mailing list