[patch] First step to fix pr11866 during LTO

Thu Sep 5 16:52:16 PDT 2013

On Sep 5, 2013, at 4:26 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
>>> In the previous example, LLVM alone cannot tell the difference from F
>>> and G. It needs the linker to call
>>> lto_codegen_add_must_preserve_symbol on the one that is used from a .o
>>> (F) and  lto_codegen_add_symtab_symbol on the one that it wants just
>>> to put in a symbol table (G).
>> 
>> You said earlier calling lto_codegen_add_symtab_symbol() enables LTO to do more optimizations.  What would break if it was called on all symbols?
> 
> Calling it *instead of  lto_codegen_add_must_preserve* enables more
> optimizations. Calling it on every symbol would for example prevent
> internalizing a hidden symbol, which the linker knows is not used
> anywhere (including symbol table).
> 
>> I see how this wires up to the gold plugin.  What I am trying to figure out is:
>> * Does ld64 need to call this new function, is so when would it call it?
> 
> It doesn't need, no. The existing add_must_preserve is sufficient for
> correctness.
> 
>> * If ld64 does not call the new function, is it missing out on optimizations?
> 
> Yes, it would get baz and zed in the symbol table of a .dylib when
> llvm would otherwise be able to drop them.

Ok.  I think I finally see what you are getting at.   The normal rule when building shared libraries (.dylib) is that all external symbols (not hidden or static) must kept.  For instance, the darwin linker will not dead strip away an unused external function in a dylib (but it would dead strip it away in a main executable) because there might be some dynamic client of that symbol.

When that rule is applied to LTO, it creates two cases when the linker calls lto_codegen_add_must_preserve_symbol():
1) the symbol is referenced by native code or the command line, or
2) a dylib is being built and the symbol is external. 
You want to differentiate those two cases, by continuing to call lto_codegen_add_must_preserve_symbol() for the first case and change to call lto_codegen_add_symtab_symbol() for the second.

But calling lto_codegen_add_must_preserve_symbol on all external symbols when linking dylibs has always been expensive (lots of calls and LTO must then map the string name to a Value object).  Wouldn’t it be more efficient to just have one new function called just once:

    void lto_codegen_preserve_global_symbols();

and have the linker stop calling lto_codegen_add_must_preserve_symbol() in case 2.   The LTO engine then use that bit to do what you are doing for all external linkonce_odr symbols.

Alternately, could the linker just stop calling lto_codegen_add_must_preserve_symbol() on weak external symbols when building dylibs?

-Nick