[llvm-dev] lld symbol choice for symbol present in both a shared and a static library, with and without LTO

Mon Jun 17 22:43:58 PDT 2019

On Tue, Jun 18, 2019 at 4:45 AM Eli Friedman <efriedma at quicinc.com> wrote:

> > -----Original Message-----
> > From: Peter Smith <peter.smith at linaro.org>
> > Sent: Monday, June 17, 2019 3:33 AM
> > To: Eli Friedman <efriedma at qualcomm.com>
> > Cc: llvm-dev <llvm-dev at lists.llvm.org>
> > Subject: [EXT] Re: [llvm-dev] lld symbol choice for symbol present in
> both a
> > shared and a static library, with and without LTO
> >
> > On Fri, 14 Jun 2019 at 20:58, Eli Friedman via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> > >
> > >
> > >
> > > If “obj.o” is built with LTO enabled, and the function is specifically
> a runtime
> > function, the behavior is different.  For example, suppose the IR
> contains a call
> > to “llvm.memcpy”, and the generated code eventually calls “memcpy”.  Or
> > suppose the IR contains a “resume” instruction, and the generated code
> > eventually calls “_Unwind_Resume”.  In this case, the choice is
> different: lld
> > always chooses the “memcpy” or “_Unwind_Resume” from the shared library,
> > ignoring the order the files are specified on the command-line.  Is this
> the
> > expected behavior?
> >
> > As I understand it, there is no more selection of members from static
> > libraries after the LTO code-generator has run. In the example from
> > the PR there is no other object with a reference to memcpy so the
> > member containing the static definition is not loaded, leaving only
> > the shared library to match against. I would expect if there were
> > another reference to memcpy from a bitcode file or another ELF file
> > and the static library was before the shared then it would match
> > against that.
> >
> > As to whether this is expected or not, I don't know for certain. One
> > desirable property of not selecting more objects from static libraries
> > is that you are guaranteed not to load any more bitcode files from
> > static libraries, which would either need compiling separately from
> > the other bitcode files, or have the whole compilation done again with
> > the new objects, which could cause more bitcode files to be loaded
> > etc.
>
> For runtime functions defined in bitcode, we avoid the "double-LTO"
> scenario you describe by including them in the LTO link even if we can't
> prove they will be used.  This is the handleLibcall code you pointed out. (
> https://github.com/llvm-mirror/lld/blob/master/ELF/Driver.cpp#L1733).  As
> the comment there describes, we don't do this for runtime functions which
> are not defined in bitcode, to avoid other side-effects; instead we resolve
> those symbols after LTO.
>
> For the scenario I'm describing, though, it looks like the key decision
> here is made in SymbolTable::addShared, before handleLibcall and LTO.  If a
> symbol is defined in both a static library and a shared library, and we
> haven't seen a reference to the static library's symbol at that point, we
> throw away the record of the symbol defined in the static library.
>
> Ultimately, I guess the question is what alternatives are possible,
> without breaking the scenarios handleLibcall is supposed to handle.  I see
> a few possibilities here:
>
> 1. Whenever we see any bitcode file, treat it as referencing every
> possible runtime function, even those defined in non-bitcode static
> libraries.  Then we try to resolve the __sync_val_compare_and_swap_8 issue
> from https://reviews.llvm.org/D50475 some other way.
>

That seems technically doable, but how do we know the names of all possible
runtime functions?

> 2. Change the symbol resolution that runs after LTO to use a different
> symbol resolution rules from normal non-LTO/before-LTO symbol resolution,
> so it finds the function from the static library instead of the shared
> library.
>

We don't actually do any symbol resolution after LTO. We merge a result of
LTO to other object files, but no new symbols are expected to appear after
LTO. We can change that assumption of course, but that's perhaps too much.

> 3. Change symbol resolution in general to prefer "lazy" symbols from
> static libraries over symbols from shared libraries, even outside LTO.  So
> "static.a shared.so object.o" picks the symbol from static.a, instead of
> shared.so like it does now.
>

This change seems risky.

> 4. We WONTFIX https://bugs.llvm.org/show_bug.cgi?id=42273 .

The other option I can think of is to add a command line option to force
loading a file from a static archive. With `-u`, we can force loading a
member file when a specified name remains undefined after name resolution.
That doesn't work for this case because after LTO `memcpy` is not an
undefined symbol but a library symbol. So, maybe we can define a new option
`-U` to insert a given name as an undefined symbol from the beginning,
which forces the linker to load a member file immediately after it finds
one.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190618/11fa51a0/attachment.html>