[PATCH] D139069: [lld-macho] Ignored aliases to weak symbols should not retain section data

Jez Ng via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 30 19:09:32 PST 2022


int3 created this revision.
int3 added a reviewer: lld-macho.
Herald added a subscriber: jeroen.dobbelaere.
Herald added projects: lld-macho, All.
int3 requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

If we have two files with the same weak symbol like so:

  ltmp0:
  _weak:
    <contents>

and

  ltmp1:
  _weak:
    <contents>

Linking them together should leave only one copy of `<contents>`, not
two. Previously, we would keep around both copies because of the
ignored-name `ltmp<N>` symbols (i.e. symbols that start with `l`) -- we
would not coalesced those, so we would treat them as retaining the
contents.

This matters for more than just size -- we are depending upon this
behavior internally for emitting a certain file format. This file
format's header is repeated in each object file, but we want it to
appear just once in our output.

Why can't we not emit those aliases to `_weak`, or reference the
`ltmp<N>` symbols instead of `_weak`? Well, MC actually adds `ltmp<N>`
symbols as part of the assembly-to-binary translation step. So any
codegen at the clang level can't access them.

All that said... this solution is actually kind of hacky. Here, we avoid
creating the ignored-name symbols at parse time. This is acceptable
since we never emit those symbols in our output. However, in ld64, any
aliasing temporary symbols (ignored or otherwise) won't retain coalesced
data. But implementing this is harder -- we would have to create those
symbols first (so we can emit their names later), but we would have to
ensure the linker correctly shuffles them around when their aliasees get
coalesced.

Additionally, ld64 treats these temporary symbols as functionally
equivalent to the weak symbols themselves -- that is, it will emit weak
binds when those non-weak temporary aliases are referenced. We have
imitated this behavior for ignored-name symbols, but implementing it for
local aliases in general seems substantially more difficult. I'm not
sure if any programs actually depend on this behavior though, so maybe
it's a moot point.

Finally, ld64 does all this regardless of whether
`.subsections_via_symbols` is specified. We don't. But again, given how
rare the lack of that directive is (I've only seen it from hand-written
assembly inputs), I don't think we need to worry about it.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D139069

Files:
  lld/MachO/InputFiles.cpp
  lld/MachO/UnwindInfoSection.cpp
  lld/test/MachO/weak-def-alias-ignored.s

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D139069.479138.patch
Type: text/x-patch
Size: 6039 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20221201/52897530/attachment.bin>


More information about the llvm-commits mailing list