[PATCH] D102964: [lld-macho] Implement cstring deduplication

Jez Ng via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 3 12:32:45 PDT 2021


int3 added a comment.

Ah, I should probably have added a bit more motivation. The internal program I've been analyzing has significant size overhead from duplicated CFStrings. These CFStrings are essentially boxed cstrings, with an additional field that needs to be bound by dyld. As such, they bloat not just the `__cfstring` section but also the binding info. I didn't quantify exactly how much of the binding info could be attributed to them, but it seemed significant.

Ultimately, I think we'll have ICF dedup these CFStrings, but in order to do so we must first dedup the cstrings they point to. Hence this diff.

I'm fine with turning merging off by default for now, until we get it integrated with ICF for a bigger win. And maybe only turn it on together with ICF. How does that sound?

In terms of prioritization, I'd like to keep the implementation of these optimizations simple for now, until we are sure that they are operating correctly. (E.g. as the commit message indicates, I uncovered alignment issues while implementing this, and I'm still not entirely sure this is the best way to handle them.) I think parallelization can wait till we're more certain that the output works...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102964/new/

https://reviews.llvm.org/D102964



More information about the llvm-commits mailing list