[llvm-dev] RFC: ELF Autolinking

Thu Mar 14 08:32:15 PDT 2019

Hello,

I've put some comments on the proposal inline. Having to had to debug
library selection problems where all the libraries are visible on the
linker command line, I would prefer if people didn't embed difficult
to find directives in object files, but I'm guessing in some languages
this is the natural way of adding libraries.

On Thu, 14 Mar 2019 at 13:08, bd1976 llvm via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>
> At Sony we offer autolinking as a feature in our ELF toolchain. We would like to see full support for this feature upstream as there is anecdotal evidence that it would find use beyond Sony.
>

I've not got any use of the existing code. Personally I've not come
across anyone wanting this type of feature, but that is also anecdotal
on my part.

>
> For ELF we need limited autolinking support. Specifically, we only need support for "comment lib" pragmas (https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017) in C/C++ e.g. #pragma comment(lib, "foo"). My suggestion that we keep the implementation as lean as possible.
>
> Principles to guide the implementation:
> - Developers should be able to easily understand autolinking behavior.
> - Developers should be able to override autolinking from the linker command line.
> - Inputs specified via pragmas should be handled in a general way to allow the same source code to work in different environments.
>
> I would like to propose that we focus on autolinking exclusively and that we divorce the implementation from the idea of "linker options" which, by nature, would tie source code to the vagaries of particular linkers. I don't see much value in supporting other linker operations so I suggest that the binary representation be a mergable string section (SHF_MERGE, SHF_STRINGS), called .autolink, with custom type SHT_LLVM_AUTOLINK (0x6fff4c04), and SHF_EXCLUDE set (to avoid the contents appearing in the output). The compiler can form this section by concatenating the arguments of the "comment lib" pragmas in the order they are encountered. Partial (-r, -Ur) links can be handled by concatenating .autolink sections with the normal mergeable string section rules. The current .linker-options can remain (or be removed); but, "comment lib" pragmas for ELF should be lowered to .autolink not to .linker-options. This makes sense as there is no linker option that "comment lib" pragmas map directly to. As an example, #pragma comment(lib, "foo") would result in:
>
> .section ".autolink","eMS", at llvm_autolink,1
>         .asciz "foo"
>
> For LTO, equivalent information to the contents of a the .autolink section will be written to the IRSymtab so that it is available to the linker for symbol resolution.
>

I'm not sure I understand the bit about "for symbol resolution". I
think that what you mean is that you will encode the autolink section
using symbols instead of as a section, and the linker is expected to
extract this when it reads the symbol table?

> The linker will process the .autolink strings in the following way:
>
> 1. Inputs from the .autolink sections of a relocatable object file are added when the linker decides to include that file (which could itself be in a library) in the link. Autolinked inputs behave as if they were appended to the command line as a group after all other options. As a consequence the set of autolinked libraries are searched last to resolve symbols.
> 2. It is an error if a file cannot be found for a given string.
> 3. Any command line options in effect at the end of the command line parsing apply to autolinked inputs, e.g. --whole-archive.

I've not got any experience of autolinking as a user, so I'm
struggling a bit with this one. I'm guessing that autolinking is
useful because someone can do the equivalent of #include <library.h>
and #pragma comment lib "library.so" in the same place without having
to fight the build system. I'm less convinced about --whole-archive as
I think this tends to be a way of structuring the build and would be
best made explicit in the build system. Moreover, what if someone
wants to not use --whole-archive, for their autolink, but one already
exists. This could be quite difficult to check with a large project.
Personally I'd have the user be explicit in the .autolink whether they
were intending it to be whole-archive or not.

> 4. Duplicate autolinked inputs are ignored.

If we take the issue of --whole-archive off the table does it matter
that there are duplicate libraries? Unresolved symbols will match
against the first library. I guess it might make a difference if this
feature is implemented in ld.lld and ld.gold, where you'd have to wrap
the libraries in a start-group, end-group, but is this likely to
happen?

> 5. The linker tries to add a library or relocatable object file from each of the strings in a .autolink section by; first, handling the string as if it was specified on the commandline; second, by looking for the string in each of the library search paths in turn; third, by looking for a lib<string>.a or lib<string>.so (depending on the current mode of the linker) in each of the library search paths.

There is some precedent for including files and libraries from
linkerscripts https://sourceware.org/binutils/docs/ld/File-Commands.html#File-Commands
, these distinguish between "-lfile" and "file". Would this be a
better fit for a ld.bfd interface compatible linker?

> 6. A new command line option --no-llvm-autolink will tell LLD to ignore the .autolink sections.

Personally I would have thought --no-llvm-autolink would error if it
found a .autolink section, on the grounds that I wanted all the
libraries to be defined on the command-line or linker script rather
than hidden in object files. I would have thought ignoring the
autolink sections would in most cases result in undefined symbols. If
there is a use case for it, perhaps --ignore-llvm-autolink.

> Rationale for the above points:
>
> 1. Adding the autolinked inputs last makes the process simple to understand from a developers perspective. All linkers are able to implement this scheme.
> 2. Error-ing for libraries that are not found seems like better behavior than failing the link during symbol resolution.
> 3. It seems useful for the user to be able to apply command line options which will affect all of the autolinked input files. There is a potential problem of surprise for developers, who might not realize that these options would apply to the "invisible" autolinked input files; however, despite the potential for surprise, this is easy for developers to reason about and gives developers the control that they may require.
> 4. Unlike on the command line it is probably easy to include the same input file twice via pragmas and might be a pain to fix; think of Third-party libraries supplied as binaries.
> 5. This algorithm takes into account all of the different ways that ELF linkers find input files. The different search methods are tried by the linker in most obvious to least obvious order.
> 6. I considered adding finer grained control over which .autolink inputs were ignored (e.g. MSVC has /nodefaultlib:<library>); however, I concluded that this is not necessary: if finer control is required developers can recreate the same effect autolinking would have had using command line options.
>
> Thoughts?
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev