[llvm-dev] RFC: ELF Autolinking
Rui Ueyama via llvm-dev
llvm-dev at lists.llvm.org
Mon Mar 18 13:02:05 PDT 2019
On Thu, Mar 14, 2019 at 1:05 PM bd1976 llvm via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Thu, Mar 14, 2019 at 6:27 PM Peter Collingbourne <peter at pcc.me.uk>
> wrote:
>
>>
>>
>> On Thu, Mar 14, 2019 at 6:08 AM bd1976 llvm via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> At Sony we offer autolinking as a feature in our ELF toolchain. We would
>>> like to see full support for this feature upstream as there is anecdotal
>>> evidence that it would find use beyond Sony.
>>>
>>> In general autolinking (https://en.wikipedia.org/wiki/Auto-linking)
>>> allows developers to specify inputs to the linker in their source code.
>>> LLVM and Clang already have support for autolinking on ELF via embedding
>>> strings, which specify linker behavior, into a .linker-options section in
>>> relocatable object files, see:
>>>
>>> RFC - http://lists.llvm.org/pipermail/llvm-dev/2018-January/120101.html
>>> LLVM -
>>> https://llvm.org/docs/Extensions.html#linker-options-section-linker-options,
>>> https://reviews.llvm.org/D40849
>>> Clang -
>>> https://clang.llvm.org/docs/LanguageExtensions.html#specifying-linker-options-on-elf-targets,
>>> https://reviews.llvm.org/D42758
>>>
>>> However, although support was added to Clang and LLVM, no support has
>>> been implemented in LLD; and, I get the sense, from reading the reviews,
>>> that there wasn't agreement on the implementation when the changes landed.
>>> The original motivation seems to have been to remove the "autolink-extract"
>>> mechanism used by Swift to workaround the lack of autolinking support for
>>> ELF. However, looking at the Swift source code, Swift still seems to be
>>> using the "autolink-extract" method.
>>>
>>> So my first question: Are there any users of the current implementation
>>> for ELF?
>>>
>>> Assuming that no one is using the current code, I would like to suggest
>>> a different mechanism for autolinking.
>>>
>>> For ELF we need limited autolinking support. Specifically, we only need
>>> support for "comment lib" pragmas (
>>> https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017)
>>> in C/C++ e.g. #pragma comment(lib, "foo"). My suggestion that we keep the
>>> implementation as lean as possible.
>>>
>>> Principles to guide the implementation:
>>> - Developers should be able to easily understand autolinking behavior.
>>> - Developers should be able to override autolinking from the linker
>>> command line.
>>> - Inputs specified via pragmas should be handled in a general way to
>>> allow the same source code to work in different environments.
>>>
>>> I would like to propose that we focus on autolinking exclusively and
>>> that we divorce the implementation from the idea of "linker options" which,
>>> by nature, would tie source code to the vagaries of particular linkers. I
>>> don't see much value in supporting other linker operations so I suggest
>>> that the binary representation be a mergable string section (SHF_MERGE,
>>> SHF_STRINGS), called .autolink, with custom type SHT_LLVM_AUTOLINK
>>> (0x6fff4c04), and SHF_EXCLUDE set (to avoid the contents appearing in the
>>> output). The compiler can form this section by concatenating the arguments
>>> of the "comment lib" pragmas in the order they are encountered. Partial
>>> (-r, -Ur) links can be handled by concatenating .autolink sections with the
>>> normal mergeable string section rules. The current .linker-options can
>>> remain (or be removed); but, "comment lib" pragmas for ELF should be
>>> lowered to .autolink not to .linker-options. This makes sense as there is
>>> no linker option that "comment lib" pragmas map directly to. As an example,
>>> #pragma comment(lib, "foo") would result in:
>>>
>>> .section ".autolink","eMS", at llvm_autolink,1
>>> .asciz "foo"
>>>
>>> For LTO, equivalent information to the contents of a the .autolink
>>> section will be written to the IRSymtab so that it is available to the
>>> linker for symbol resolution.
>>>
>>> The linker will process the .autolink strings in the following way:
>>>
>>> 1. Inputs from the .autolink sections of a relocatable object file are
>>> added when the linker decides to include that file (which could itself be
>>> in a library) in the link. Autolinked inputs behave as if they were
>>> appended to the command line as a group after all other options. As a
>>> consequence the set of autolinked libraries are searched last to resolve
>>> symbols.
>>>
>>
>> If we want this to be compatible with GNU linkers, doesn't the autolinked
>> input need to appear at the point immediately after the object file appears
>> in the link? I'm imagining the case where you have a statically linked libc
>> as well as a libbar.a autolinked from a foo.o. The link command line would
>> look like this:
>>
>> ld foo.o -lc
>>
>> Now foo.o autolinks against bar. The command line becomes:
>>
>> ld foo.o -lc -lbar
>>
>
> Actually, I was thinking that on a GNU linker the command line would
> become "ld foo.o -lc -( -lbar )-"; but, this doesn't affect your point.
>
>
>>
>> If libbar.a requires an additional object file from libc.a, it will not
>> be added to the link.
>>
>>
> As it stands all the dependencies of an autolinked library must themselves
> be autolinked. I had imagined that this is a reasonable limitation. If not
> we need another scheme. I try to think about some motivating examples for
> this.
>
>
>> 2. It is an error if a file cannot be found for a given string.
>>> 3. Any command line options in effect at the end of the command line
>>> parsing apply to autolinked inputs, e.g. --whole-archive.
>>> 4. Duplicate autolinked inputs are ignored.
>>>
>>
>> This seems like it would work in GNU linkers, as long as the autolinked
>> file is added to the link immediately after the last mention, rather than
>> the first. Otherwise a command line like:
>>
>> ld foo1.o foo2.o
>>
>> (where foo1.o and foo2.o both autolink bar) could end up looking like:
>>
>> ld foo1.o -lbar foo2.o
>>
>> and you will not link anything from libbar.a that only foo2.o requires.
>> It may end up being simpler to not ignore duplicates.
>>
>
> Correct; but, given that the proposal was to handle the libraries as if
> they are appended to the link line after everything on the command line
> then I think this will work. With deduplication (and the use of SHF_MERGE)
> developers get no ordering guarantees. I claim that this is a feature! My
> rationale is that the order in which libraries are linked affects different
> linkers in different ways (e.g. LLD does not resolve symbols from archives
> in a compatible manner with either the Microsoft linker or the GNU
> linkers.), by not allowing the user to control the order I am essentially
> saying that autolinking is not suitable for libraries that offer competing
> copies of the same symbol. This ties into my argument that "comment lib"
> pragmas should be handled in as "general" a way as possible.
>
Right. I think if you need a fine control over the link order, autolinking
is not a feature you want to use. Or, in general, if your program is
sensitive to a link order because its source object files have competing
symbols of the same name, it's perhaps unnecessarily fragile.
That being said, I think you need to address the issue that pcc pointed
out. If you statically link a program `foo` with the following command line
ld -o foo foo.o -lc
, `foo.o` auto-imports libbar.a, and libbar.a depends on libc.a, can your
proposed feature pull out object files needed for libbar.a?
5. The linker tries to add a library or relocatable object file from each
>>> of the strings in a .autolink section by; first, handling the string as if
>>> it was specified on the commandline; second, by looking for the string in
>>> each of the library search paths in turn; third, by looking for a
>>> lib<string>.a or lib<string>.so (depending on the current mode of the
>>> linker) in each of the library search paths.
>>>
>>
>> Is the second part necessary? "-l:foo" causes the linker to search for a
>> file named "foo" in the library search path, so it seems that allowing the
>> autolink string to look like ":foo" would satisfy this use case.
>>
>
>
> I worded the proposal to avoid mapping "comment lib" pragmas to --library
> command line options. My reasons:
>
> 1. I find the requirement that the user put ':' in their lib strings
> slightly awkward. It means that the source code is now coupled to a
> GNU-style linker. So then this isn't merely an ELF linking proposal, it's a
> proposal for ELF toolchains with GNU-like linkers (e.g. the arm linker
> doesn't support the colon prefix
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0474c/Cjahbdei.html
> ).
>
> 2. The syntax is #pragma comment(lib, ...) not #pragma
> linker-option(library, ...) i.e. the only thing this (frankly rather
> bizarre) syntax definitely implies is that the argument is related to
> libraries (and comments ¯\_(ツ)_/¯); it is a bit of a stretch to interpret
> "comment lib" pragmas as mapping directly to "specifying an additional
> --library command line option".
>
> AFAIK all linkers support two ways of specifying inputs; firstly, directly
> on the command line; secondly, with an option with very similar semantics
> to GNU's --library option. I choose a method of finding a input files that
> encompasses both methods of specifying a library on the command line. I
> think that this method is actually more intuitive than either the method
> used by the linker script INPUT command or by --library. FWIW, I looked
> into the history of the colon prefix. It was added in
> https://www.sourceware.org/ml/binutils/2007-03/msg00421.html.
> Unfortunately, the rationale given is that it was merely a port of a
> vxworks linker extension. I couldn't trace the history any further than
> that to find the actual design discussion. The linker script command INPUT
> uses a different scheme and the command already had this search order 20
> years ago, which is the earliest version of the GNU linker I have history
> for; again, the rationale is not available.
>
>
>> 6. A new command line option --no-llvm-autolink will tell LLD to ignore
>>> the .autolink sections.
>>>
>>> Rationale for the above points:
>>>
>>> 1. Adding the autolinked inputs last makes the process simple to
>>> understand from a developers perspective. All linkers are able to implement
>>> this scheme.
>>> 2. Error-ing for libraries that are not found seems like better behavior
>>> than failing the link during symbol resolution.
>>> 3. It seems useful for the user to be able to apply command line options
>>> which will affect all of the autolinked input files. There is a potential
>>> problem of surprise for developers, who might not realize that these
>>> options would apply to the "invisible" autolinked input files; however,
>>> despite the potential for surprise, this is easy for developers to reason
>>> about and gives developers the control that they may require.
>>> 4. Unlike on the command line it is probably easy to include the same
>>> input file twice via pragmas and might be a pain to fix; think of
>>> Third-party libraries supplied as binaries.
>>> 5. This algorithm takes into account all of the different ways that ELF
>>> linkers find input files. The different search methods are tried by the
>>> linker in most obvious to least obvious order.
>>> 6. I considered adding finer grained control over which .autolink inputs
>>> were ignored (e.g. MSVC has /nodefaultlib:<library>); however, I concluded
>>> that this is not necessary: if finer control is required developers can
>>> recreate the same effect autolinking would have had using command line
>>> options.
>>>
>>> Thoughts?
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>
>> --
>> --
>> Peter
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190318/2d065d8f/attachment.html>
More information about the llvm-dev
mailing list