[llvm-dev] GC for defsym'd symbols in LLD
Fāng-ruì Sòng via llvm-dev
llvm-dev at lists.llvm.org
Thu Dec 5 14:17:06 PST 2019
I have made some further investigation. My conclusion is that GNU ld does
not do better than lld. Making the --defsym behavior ideal is difficult in
the current framework.
GNU ld has some unintended behaviors.
ld.bfd a.o --defsym 'd=foo' --gc-sections -o a => GNU ld retains .text_foo
ld.bfd a.o --defsym 'd=foo+3' --gc-sections -o a => GNU ld drops .text_foo
ld.bfd a.o --defsym 'd=bar-bar+foo' --gc-sections -o a => GNU ld drops
.text_foo
I traced its logic under a debugger. Here is the stack trace:
ld/ldlang.c:lang_gc_sections
bfd/elflink.c:bfd_elf_gc_sections
bfd/elflink.c:_bfd_elf_gc_mark_reloc
...
bfd/elflink.c:_bfd_elf_gc_mark_hook
asection *
_bfd_elf_gc_mark_hook (asection *sec,
...
case bfd_link_hash_defined:
case bfd_link_hash_defweak:
// It points to .text_foo for --defsym d=foo, but *ABS* for --defsym
d=bar-bar+foo or --defsym d=foo+3
return h->root.u.def.section;
GNU ld evaluates symbol assignments in many passes, the representation of a
symbol (section+offset) can vary among passes.
In the GC pass, its rule only works for simple expressions like --defsym
d=foo, but not any slightly complex expressions.
In lld, it would be difficult to drop the following rule in MarkLive.cpp:
for (StringRef s : script->referencedSymbols)
markSymbol(symtab->find(s));
The issue can be demonstrated by the following call tree:
LinkerDriver::link
markLive
...
resolveReloc
// Defined::section is nullptr for `d` because the assignment d=foo
hasn't been evaluated yet.
writeResult
Writer<ELFT>::run
Writer<ELFT>::finalizeSections
LinkerScript::processSymbolAssignments
// Symbol section+offset are evaluated here.
It seems that github issues may be a good place to record the problem. I
just created https://github.com/llvm/llvm-project/issues/52
I wanted to mark it low priority, but there is no such label.
On Wed, Dec 4, 2019 at 8:51 AM Shoaib Meenai <smeenai at fb.com> wrote:
> I completely agree that --defsym foo=bar should keep bar (or more
> precisely the section containing bar) alive if foo is referenced.
>
> My mental model of how --defsym foo=bar behaves is that (assuming bar is a
> defined symbol) we create a symbol foo that points to the same location as
> bar (as in it has the same section + address within that section). Any
> reference to foo should therefore prevent that section from getting garbage
> collected. bar doesn't need to enter the picture directly (and we don't
> need to store any sort of explicit link between foo and bar); its section
> getting preserved just naturally falls out of foo getting preserved.
>
> For example, in Fāng-ruì's movabs example, the symbol _start (which is the
> entry point and therefore a GC root) will have a relocation against d, so d
> will be kept alive too. With --defsym d=foo, the symbol d should point to
> the same section as foo, so that section will be preserved; it doesn't
> matter if the symbol foo itself is preserved (unless there are other
> non-dead references to it, of course, but then those references should
> cause foo to be marked alive as well).
>
> I haven't actually studied how LLD models a defsym though, so my mental
> model might be way off. I apologize for not having done so before replying,
> but it'll be at least a few days before I have the chance to get to that.
> If my mental model is accurate, preserving the needed section for defsym
> should just fall out naturally from it (without needing to give the target
> of a defsym any special treatment), but if not, the whole thing might be
> much more complicated and not worth it.
>
> On 12/4/19, 1:35 AM, "Peter Smith" <peter.smith at linaro.org> wrote:
>
> On Wed, 4 Dec 2019 at 07:05, Fāng-ruì Sòng <maskray at google.com> wrote:
> >
> > On Tue, Dec 3, 2019 at 7:02 PM Shoaib Meenai via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> > >
> > > LLD treats any symbol referenced from a linker script as a GC
> root, which makes sense. Unfortunately, it also processes --defsym as a
> linker script fragment internally, so all target symbols of a --defsym also
> get treated as GC roots (i.e., if you have something like --defsym SRC=TGT,
> TGT will become a GC root). I believe this to be unnecessary for defsym
> specifically, since you're just aliasing a symbol, and if the original or
> aliased symbols are referenced from anywhere, the symbol's section will get
> preserved anyway. (There's also cases where the defsym target can be an
> expression instead of just a symbol name, which I admittedly haven't
> thought about too hard, but I believe the same logic should hold in terms
> of any needed sections getting preserved regardless.) I want to change
> defsym targets specifically to not be considered as GC roots, so that they
> can be dead code eliminated. Does anyone foresee any issues with this?
> >
> > % cat a.s
> > .globl _start, foo, bar
> > .text; _start: movabs $d, %rax
> > .section .text_foo,"ax"; foo: ret
> > .section .text_bar,"ax"; bar: nop
> > % as a.s -o a.o
> >
> > % ld.bfd a.o --defsym d=foo --gc-sections -o a => .text_foo is
> retained
> > % ld.bfd a.o --defsym d=bar --gc-sections -o a => .text_bar is
> retained
> > % ld.bfd a.o --defsym d=1 --gc-sections -o a => Neither .text_foo nor
> > .text_bar is retained
> > % ld.bfd a.o --defsym c=foo --defsym d=1 --gc-sections -o a =>
> Neither
> > .text_foo nor .text_bar is retained; lld will retain .text_foo.
> >
> > For --defsym from=an_expression_with_to, GNU ld appears to add a
> > reference from 'from' to 'to'. lld's behavior
> > (
> https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D34195&d=DwIFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=o3kDXzdBUE3ljQXKeTWOMw&m=MpiPCWMhZJFZg0s-e1lhHtcCr-BLzG6zbJ44d0isoMc&s=7j_hrwm8LBMCPNgU_IXbhye_YKPQFgGJlU3YMAtWGLE&e=
> ) is more conservative.
> >
> > If we stop treating script->referencedSymbols as GC roots,
> > instructions like `movabs $d, %rax` will no longer be able to access
> > the intended section. We can tweak our behavior to be like GNU ld,
> but
> > the additional complexity may not be worthwhile.
>
> I think it would be a step too far for defsym symbol=expression to
> have no effect on GC. I'd expect that something like defsym foo=bar is
> used because some live code refers to foo, but does not refer to bar,
> so ideally we'd like defsym foo=bar to keep bar live. I've seen this
> idiom used in embedded systems in the presence of binary only
> libraries. It is true that the programmer can always go the extra mile
> to force bar to be marked live, however I think the expectation would
> be defsym foo=bar would do it.
>
> I think the GNU ld behaviour is reasonable. If nothing refers to
> either foo or bar then there is no reason to mark them live. On the
> implementation cost-benefit trade off I guess we won't know until
> there is a prototype, and some idea of what implementing it will save
> on a real example.
>
> Peter
>
>
>
--
宋方睿
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191205/1cf7f3cd/attachment.html>
More information about the llvm-dev
mailing list