[llvm-dev] GC for defsym'd symbols in LLD
Fāng-ruì Sòng via llvm-dev
llvm-dev at lists.llvm.org
Thu Dec 5 14:17:06 PST 2019
I have made some further investigation. My conclusion is that GNU ld does
not do better than lld. Making the --defsym behavior ideal is difficult in
the current framework.
GNU ld has some unintended behaviors.
ld.bfd a.o --defsym 'd=foo' --gc-sections -o a => GNU ld retains .text_foo
ld.bfd a.o --defsym 'd=foo+3' --gc-sections -o a => GNU ld drops .text_foo
ld.bfd a.o --defsym 'd=bar-bar+foo' --gc-sections -o a => GNU ld drops
I traced its logic under a debugger. Here is the stack trace:
_bfd_elf_gc_mark_hook (asection *sec,
// It points to .text_foo for --defsym d=foo, but *ABS* for --defsym
d=bar-bar+foo or --defsym d=foo+3
GNU ld evaluates symbol assignments in many passes, the representation of a
symbol (section+offset) can vary among passes.
In the GC pass, its rule only works for simple expressions like --defsym
d=foo, but not any slightly complex expressions.
In lld, it would be difficult to drop the following rule in MarkLive.cpp:
for (StringRef s : script->referencedSymbols)
The issue can be demonstrated by the following call tree:
// Defined::section is nullptr for `d` because the assignment d=foo
hasn't been evaluated yet.
// Symbol section+offset are evaluated here.
It seems that github issues may be a good place to record the problem. I
just created https://github.com/llvm/llvm-project/issues/52
I wanted to mark it low priority, but there is no such label.
On Wed, Dec 4, 2019 at 8:51 AM Shoaib Meenai <smeenai at fb.com> wrote:
> I completely agree that --defsym foo=bar should keep bar (or more
> precisely the section containing bar) alive if foo is referenced.
> My mental model of how --defsym foo=bar behaves is that (assuming bar is a
> defined symbol) we create a symbol foo that points to the same location as
> bar (as in it has the same section + address within that section). Any
> reference to foo should therefore prevent that section from getting garbage
> collected. bar doesn't need to enter the picture directly (and we don't
> need to store any sort of explicit link between foo and bar); its section
> getting preserved just naturally falls out of foo getting preserved.
> For example, in Fāng-ruì's movabs example, the symbol _start (which is the
> entry point and therefore a GC root) will have a relocation against d, so d
> will be kept alive too. With --defsym d=foo, the symbol d should point to
> the same section as foo, so that section will be preserved; it doesn't
> matter if the symbol foo itself is preserved (unless there are other
> non-dead references to it, of course, but then those references should
> cause foo to be marked alive as well).
> I haven't actually studied how LLD models a defsym though, so my mental
> model might be way off. I apologize for not having done so before replying,
> but it'll be at least a few days before I have the chance to get to that.
> If my mental model is accurate, preserving the needed section for defsym
> should just fall out naturally from it (without needing to give the target
> of a defsym any special treatment), but if not, the whole thing might be
> much more complicated and not worth it.
> On 12/4/19, 1:35 AM, "Peter Smith" <peter.smith at linaro.org> wrote:
> On Wed, 4 Dec 2019 at 07:05, Fāng-ruì Sòng <maskray at google.com> wrote:
> > On Tue, Dec 3, 2019 at 7:02 PM Shoaib Meenai via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> > >
> > > LLD treats any symbol referenced from a linker script as a GC
> root, which makes sense. Unfortunately, it also processes --defsym as a
> linker script fragment internally, so all target symbols of a --defsym also
> get treated as GC roots (i.e., if you have something like --defsym SRC=TGT,
> TGT will become a GC root). I believe this to be unnecessary for defsym
> specifically, since you're just aliasing a symbol, and if the original or
> aliased symbols are referenced from anywhere, the symbol's section will get
> preserved anyway. (There's also cases where the defsym target can be an
> expression instead of just a symbol name, which I admittedly haven't
> thought about too hard, but I believe the same logic should hold in terms
> of any needed sections getting preserved regardless.) I want to change
> defsym targets specifically to not be considered as GC roots, so that they
> can be dead code eliminated. Does anyone foresee any issues with this?
> > % cat a.s
> > .globl _start, foo, bar
> > .text; _start: movabs $d, %rax
> > .section .text_foo,"ax"; foo: ret
> > .section .text_bar,"ax"; bar: nop
> > % as a.s -o a.o
> > % ld.bfd a.o --defsym d=foo --gc-sections -o a => .text_foo is
> > % ld.bfd a.o --defsym d=bar --gc-sections -o a => .text_bar is
> > % ld.bfd a.o --defsym d=1 --gc-sections -o a => Neither .text_foo nor
> > .text_bar is retained
> > % ld.bfd a.o --defsym c=foo --defsym d=1 --gc-sections -o a =>
> > .text_foo nor .text_bar is retained; lld will retain .text_foo.
> > For --defsym from=an_expression_with_to, GNU ld appears to add a
> > reference from 'from' to 'to'. lld's behavior
> > (
> ) is more conservative.
> > If we stop treating script->referencedSymbols as GC roots,
> > instructions like `movabs $d, %rax` will no longer be able to access
> > the intended section. We can tweak our behavior to be like GNU ld,
> > the additional complexity may not be worthwhile.
> I think it would be a step too far for defsym symbol=expression to
> have no effect on GC. I'd expect that something like defsym foo=bar is
> used because some live code refers to foo, but does not refer to bar,
> so ideally we'd like defsym foo=bar to keep bar live. I've seen this
> idiom used in embedded systems in the presence of binary only
> libraries. It is true that the programmer can always go the extra mile
> to force bar to be marked live, however I think the expectation would
> be defsym foo=bar would do it.
> I think the GNU ld behaviour is reasonable. If nothing refers to
> either foo or bar then there is no reason to mark them live. On the
> implementation cost-benefit trade off I guess we won't know until
> there is a prototype, and some idea of what implementing it will save
> on a real example.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev