[PATCH] D103292: [lld-macho] Implement ICF

Greg McGary via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 8 09:48:56 PDT 2021


gkm marked 2 inline comments as done.
gkm added a comment.

In D103292#2804387 <https://reviews.llvm.org/D103292#2804387>, @int3 wrote:

> - The logging code is fairly extensive. Is it really necessary to commit it? If we are leaving it in, could we perhaps cut out InputSection::defined and symbolicate addresses by doing a linear/binary search for the matching symbol?

I can remove the logging and retain it as a patch if/when needed.

> - Relatedly: can we have a perf benchmark between LLD-before-this-diff and LLD-with-this-diff-and-ICF-none? I'm wondering if the increased size of the InputSection will cost us.

Yes.

> - "Folding cannot occur across output-section boundaries, therefore ConcatOutputSection is the natural place to run ICF." -- is this not the case for ELF and COFF? Personally I would prefer keeping "core data types" like ConcatOutputSection as simple as possible, as well as to mirror the ELF/COFF ports more closely.

ELF & COFF also fold only within output sections, but the segregation algorithm runs on all input sections in one batch, so output-section boundaries are enforced explicitly during segregation. Output-section boundaries need not be checked for MachO, because we have already sorted all input sections into `ConcatOutputSection`s. Perhaps I should just remove the comment because it is a minor point and might serve only to mislead.



================
Comment at: lld/MachO/ConcatOutputSection.cpp:386
+// the application after dyld applies fixups to pointer data. Some sections are
+// deduplicated elsewhere (__DATA_CONST,__cfstring), and some are synthetic
+// (__DATA_CONST,__got). There are no ICF opportunities here.
----------------
int3 wrote:
> I was hoping that `__cfstring` would be handled seamlessly by ICF once it's integrated with literal merging. (And I think the current implementation should be able to handle it.)
We can experiment with ICF on only `__TEXT,__text` and `__DATA_CONST,__cfstring`.


================
Comment at: lld/MachO/InputFiles.cpp:622-628
+      if (isZeroFill(isec->flags)) {
+        nextIsec->data = {nullptr, isec->data.size() - symbolOffset};
+        isec->data = {nullptr, symbolOffset};
+      } else {
+        nextIsec->data = isec->data.slice(symbolOffset);
+        isec->data = isec->data.slice(0, symbolOffset);
+      }
----------------
int3 wrote:
> why is this change necessary?
Not strictly necessary. I saw SEGVs when checking `isec->data.data() == nullptr` because `parseSymbols` would advance the initial `nullptr` for a zero-fill section at every split. The alternative is to designate `data.data()` as undefined for zero-fill sections.


================
Comment at: lld/MachO/InputSection.h:57
+  // Equivalence-class ID for ICF
+  uint64_t icfEquivalenceClass[2] = {0, 0};
+  // Track the defined symbol that references this - for ICF logs
----------------
int3 wrote:
> I roughly get why we have two entries here, but a comment explaining it would be nice
It is likely best to refer the reader to the block comment in `lld/ELF/ICF.cpp` where the segregation algorithm is documented.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103292/new/

https://reviews.llvm.org/D103292



More information about the llvm-commits mailing list