[llvm-dev] lld symbol choice for symbol present in both a shared and a static library, with and without LTO

Mon Jun 17 03:33:09 PDT 2019

On Fri, 14 Jun 2019 at 20:58, Eli Friedman via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>
> I filed https://bugs.llvm.org/show_bug.cgi?id=42273 last night, about an inconsistency between LTO and non-LTO workflows.
>
>
>
> The basic scenario is that we have an object file which calls a function “foo”, a static library that provides an implementation of “foo”, and a shared library that also provides an implementation of “foo”.  Currently, whether lld chooses the symbol from the static library or the shared library depends on the order the files are specified on the command-line.  For “obj.o static.a shared.so”, or “static.a obj.o shared.so”, lld chooses the symbol from the static library. For any other order, it chooses the symbol from the shared library.  Is this the expected behavior?  (As far as I can tell, this matches binutils ld except for the “static.a obj.o shared.so” case.)
>

That would match my expectations. The symbol tables are loaded in left
to right order so if static.a comes before shared.so it's symbols will
be matched against first. In GNU ld, as you point out, once a library
has been passed in the command line its symbols are forgotten whereas
in LLD they are not, hence the difference with static.a obj.o
shared.so).

One area where the dynamic library is preferred is when -l or --library is used.
When -lfoo is used and libfoo.a and libfoo.so both exist, both LLD and
ld.bfd will prefer libfoo.so to libfoo.a when searching for the
library, unless -Bstatic is in force at the time.

>
>
> If “obj.o” is built with LTO enabled, and the function is specifically a runtime function, the behavior is different.  For example, suppose the IR contains a call to “llvm.memcpy”, and the generated code eventually calls “memcpy”.  Or suppose the IR contains a “resume” instruction, and the generated code eventually calls “_Unwind_Resume”.  In this case, the choice is different: lld always chooses the “memcpy” or “_Unwind_Resume” from the shared library, ignoring the order the files are specified on the command-line.  Is this the expected behavior?

As I understand it, there is no more selection of members from static
libraries after the LTO code-generator has run. In the example from
the PR there is no other object with a reference to memcpy so the
member containing the static definition is not loaded, leaving only
the shared library to match against. I would expect if there were
another reference to memcpy from a bitcode file or another ELF file
and the static library was before the shared then it would match
against that.

As to whether this is expected or not, I don't know for certain. One
desirable property of not selecting more objects from static libraries
is that you are guaranteed not to load any more bitcode files from
static libraries, which would either need compiling separately from
the other bitcode files, or have the whole compilation done again with
the new objects, which could cause more bitcode files to be loaded
etc.

There is a comment at
https://github.com/llvm-mirror/lld/blob/master/ELF/Driver.cpp#L1733
which hints at special treatment for functions named in
llvm/IR/RuntimeLibcalls.def this includes memcpy and _Unwind_Resume. I
don't know enough about LTO to know whether it makes a difference in
this case. May be worth a look.

Peter

>
>
>
> -Eli
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev