[llvm-dev] RFC: ELF Autolinking
    bd1976 llvm via llvm-dev 
    llvm-dev at lists.llvm.org
       
    Thu Mar 21 09:49:37 PDT 2019
    
    
  
On Thu, Mar 21, 2019 at 12:06 AM Rui Ueyama <ruiu at google.com> wrote:
> Perhaps there's no one clean way to solve this issue, because previously
> all libraries and object files are explicitly given to the linker via a
> command line and the order of files in the command line matters. That
> assumes human intervention to work correctly. Now, the autolinking feature
> will add libraries implicitly. Since it's implicit, there will be only one
> way how that works, so sometimes that works and sometimes doesn't.
>
> It feels to me that we should aim for making it work reasonably well for
> reasonable use cases. By reasonable use cases, I'm thinking of the
> following:
>
>  1. --static option may or may not be given (i.e. we should allow that
> feature for both static linking and dynamic linking.)
>  2. There are no competing defined symbols in a given set of libraries, or
> if they exist, the program owner doesn't care which is linked to their
> program.
>  3. There may be circular dependencies between libraries.
>
> I don't think the above assumption is too odd. If I have to implement the
> autolinking feature to GNU linker for the above scenario, I'd probably use
> the following scheme:
>
>  1. While reading object files, memorize libraries that are autolinked
>  2. After linking everything, create a list of files consisting of
> autolinked libraries AND libraries given via the command line
>  3. Visit each file in the list as if they were wrapped in --start-group
> and --end-group.
>
> I'd think the above scheme should work reasonably well. What do you think?
>
Very nice. I agree with your definition of "reasonable" usecaes (actually,
as I have said before, I think that restricting autolinking to this
"reasonable" set is actually a feature -  to avoid developers having source
code that only works with a particular linker). I also like the proposal
for a GNU implementation - I think this is enough to show that GNU-like
linkers could implement this.
At this point I will try to prototype this up so that people have an
implementation to play with.
I am keen to hear from Saleem (compnerd) on this, as he did the original
.linker-options work.
>
> On Tue, Mar 19, 2019 at 11:02 AM bd1976 llvm <bd1976llvm at gmail.com> wrote:
>
>> On Mon, Mar 18, 2019 at 8:02 PM Rui Ueyama <ruiu at google.com> wrote:
>>
>>> On Thu, Mar 14, 2019 at 1:05 PM bd1976 llvm via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> On Thu, Mar 14, 2019 at 6:27 PM Peter Collingbourne <peter at pcc.me.uk>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Mar 14, 2019 at 6:08 AM bd1976 llvm via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> At Sony we offer autolinking as a feature in our ELF toolchain. We
>>>>>> would like to see full support for this feature upstream as there is
>>>>>> anecdotal evidence that it would find use beyond Sony.
>>>>>>
>>>>>> In general autolinking (https://en.wikipedia.org/wiki/Auto-linking)
>>>>>> allows developers to specify inputs to the linker in their source code.
>>>>>> LLVM and Clang already have support for autolinking on ELF via embedding
>>>>>> strings, which specify linker behavior, into a .linker-options section in
>>>>>> relocatable object files, see:
>>>>>>
>>>>>> RFC -
>>>>>> http://lists.llvm.org/pipermail/llvm-dev/2018-January/120101.html
>>>>>> LLVM -
>>>>>> https://llvm.org/docs/Extensions.html#linker-options-section-linker-options,
>>>>>> https://reviews.llvm.org/D40849
>>>>>> Clang -
>>>>>> https://clang.llvm.org/docs/LanguageExtensions.html#specifying-linker-options-on-elf-targets,
>>>>>> https://reviews.llvm.org/D42758
>>>>>>
>>>>>> However, although support was added to Clang and LLVM, no support has
>>>>>> been implemented in LLD; and, I get the sense, from reading the reviews,
>>>>>> that there wasn't agreement on the implementation when the changes landed.
>>>>>> The original motivation seems to have been to remove the "autolink-extract"
>>>>>> mechanism used by Swift to workaround the lack of autolinking support for
>>>>>> ELF. However, looking at the Swift source code, Swift still seems to be
>>>>>> using the "autolink-extract" method.
>>>>>>
>>>>>> So my first question: Are there any users of the current
>>>>>> implementation for ELF?
>>>>>>
>>>>>> Assuming that no one is using the current code, I would like to
>>>>>> suggest a different mechanism for autolinking.
>>>>>>
>>>>>> For ELF we need limited autolinking support. Specifically, we only
>>>>>> need support for "comment lib" pragmas (
>>>>>> https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017)
>>>>>> in C/C++ e.g. #pragma comment(lib, "foo"). My suggestion that we keep the
>>>>>> implementation as lean as possible.
>>>>>>
>>>>>> Principles to guide the implementation:
>>>>>> - Developers should be able to easily understand autolinking behavior.
>>>>>> - Developers should be able to override autolinking from the linker
>>>>>> command line.
>>>>>> - Inputs specified via pragmas should be handled in a general way to
>>>>>> allow the same source code to work in different environments.
>>>>>>
>>>>>> I would like to propose that we focus on autolinking exclusively and
>>>>>> that we divorce the implementation from the idea of "linker options" which,
>>>>>> by nature, would tie source code to the vagaries of particular linkers. I
>>>>>> don't see much value in supporting other linker operations so I suggest
>>>>>> that the binary representation be a mergable string section (SHF_MERGE,
>>>>>> SHF_STRINGS), called .autolink, with custom type SHT_LLVM_AUTOLINK
>>>>>> (0x6fff4c04), and SHF_EXCLUDE set (to avoid the contents appearing in the
>>>>>> output). The compiler can form this section by concatenating the arguments
>>>>>> of the "comment lib" pragmas in the order they are encountered. Partial
>>>>>> (-r, -Ur) links can be handled by concatenating .autolink sections with the
>>>>>> normal mergeable string section rules. The current .linker-options can
>>>>>> remain (or be removed); but, "comment lib" pragmas for ELF should be
>>>>>> lowered to .autolink not to .linker-options. This makes sense as there is
>>>>>> no linker option that "comment lib" pragmas map directly to. As an example,
>>>>>> #pragma comment(lib, "foo") would result in:
>>>>>>
>>>>>> .section ".autolink","eMS", at llvm_autolink,1
>>>>>>         .asciz "foo"
>>>>>>
>>>>>> For LTO, equivalent information to the contents of a the .autolink
>>>>>> section will be written to the IRSymtab so that it is available to the
>>>>>> linker for symbol resolution.
>>>>>>
>>>>>> The linker will process the .autolink strings in the following way:
>>>>>>
>>>>>> 1. Inputs from the .autolink sections of a relocatable object file
>>>>>> are added when the linker decides to include that file (which could itself
>>>>>> be in a library) in the link. Autolinked inputs behave as if they were
>>>>>> appended to the command line as a group after all other options. As a
>>>>>> consequence the set of autolinked libraries are searched last to resolve
>>>>>> symbols.
>>>>>>
>>>>>
>>>>> If we want this to be compatible with GNU linkers, doesn't the
>>>>> autolinked input need to appear at the point immediately after the object
>>>>> file appears in the link? I'm imagining the case where you have a
>>>>> statically linked libc as well as a libbar.a autolinked from a foo.o. The
>>>>> link command line would look like this:
>>>>>
>>>>> ld foo.o -lc
>>>>>
>>>>> Now foo.o autolinks against bar. The command line becomes:
>>>>>
>>>>> ld foo.o -lc -lbar
>>>>>
>>>>
>>>> Actually, I was thinking that on a GNU linker the command line would
>>>> become "ld foo.o -lc -( -lbar )-"; but, this doesn't affect your point.
>>>>
>>>>
>>>>>
>>>>> If libbar.a requires an additional object file from libc.a, it will
>>>>> not be added to the link.
>>>>>
>>>>>
>>>> As it stands all the dependencies of an autolinked library must
>>>> themselves be autolinked. I had imagined that this is a reasonable
>>>> limitation. If not we need another scheme. I try to think about some
>>>> motivating examples for this.
>>>>
>>>>
>>>>> 2. It is an error if a file cannot be found for a given string.
>>>>>> 3. Any command line options in effect at the end of the command line
>>>>>> parsing apply to autolinked inputs, e.g. --whole-archive.
>>>>>> 4. Duplicate autolinked inputs are ignored.
>>>>>>
>>>>>
>>>>> This seems like it would work in GNU linkers, as long as the
>>>>> autolinked file is added to the link immediately after the last mention,
>>>>> rather than the first. Otherwise a command line like:
>>>>>
>>>>> ld foo1.o foo2.o
>>>>>
>>>>> (where foo1.o and foo2.o both autolink bar) could end up looking like:
>>>>>
>>>>> ld foo1.o -lbar foo2.o
>>>>>
>>>>> and you will not link anything from libbar.a that only foo2.o
>>>>> requires. It may end up being simpler to not ignore duplicates.
>>>>>
>>>>
>>>> Correct; but, given that the proposal was to handle the libraries as if
>>>> they are appended to the link line after everything on the command line
>>>> then I think this will work. With deduplication (and the use of SHF_MERGE)
>>>> developers get no ordering guarantees. I claim that this is a feature! My
>>>> rationale is that the order in which libraries are linked affects different
>>>> linkers in different ways (e.g. LLD does not resolve symbols from archives
>>>> in a compatible manner with either the Microsoft linker or the GNU
>>>> linkers.), by not allowing the user to control the order I am essentially
>>>> saying that autolinking is not suitable for libraries that offer competing
>>>> copies of the same symbol. This ties into my argument that "comment lib"
>>>> pragmas should be handled in as "general" a way as possible.
>>>>
>>>
>>> Right. I think if you need a fine control over the link order,
>>> autolinking is not a feature you want to use. Or, in general, if your
>>> program is sensitive to a link order because its source object files have
>>> competing symbols of the same name, it's perhaps unnecessarily fragile.
>>>
>>> That being said, I think you need to address the issue that pcc pointed
>>> out. If you statically link a program `foo` with the following command line
>>>
>>>   ld -o foo foo.o -lc
>>>
>>> , `foo.o` auto-imports libbar.a, and libbar.a depends on libc.a, can
>>> your proposed feature pull out object files needed for libbar.a?
>>>
>>
>> It won't work on GNU linkers. It will work with LLD as LLD has MSVC-like
>> archive handling. However, I would like to make sure that whatever we come
>> up with can be supported in the GNU toolchain.
>>
>> I had thought that it would be acceptable that all the dependencies of an
>> autolinked library must themselves be autolinked in order to work on GNU
>> style linkers. Having thought more, I don't like this limitation -
>> especially as it doesn't exist for Microsoft style linkers. One possible
>> resolution could be that GNU linkers might have to implement another
>> command line option e.g. --auto-dep=<file> to allow injection into the
>> group of autolinked libraries.
>>
>> i.e In pcc's example you would need to do: "ld foo.o --auto-dep=libc.a"
>> which would become "ld --start-group libbar.a libc.a --end-group" with
>> autolinking.
>>
>> I wanted to avoid the approach of inserting autolinked libraries after
>> the object that autolinks them. In LLD (and MSVC) it becomes hard to reason
>> about "where" the linker is in the command line and it would also mean that
>> we can't have the nice separation between parsing the command line and
>> doing the rest of the link that we currently have. Also, if you give people
>> a way to have a fine grained control over the link order with autolinking
>> you risk ending up with source code that will link on GNU style linkers but
>> not with LLD (assuming GNU ever implemented support for autolinking).
>>
>> Scenario:
>>
>> libbar.a(bar.o) - defines symbol bar
>> libfoo.a(foo.o) - defines foo and autolinks libbar.a
>> main.o - references foo
>> another.o - does not reference foo
>> No references to bar exist
>>
>> lld -lfoo another.o --whole-archive main.o with autolinking becomes lld
>> -lfoo another.o --whole-archive main.o -lbar result: bar.o gets added to
>> the link.
>> But, if a change is made so that another.o references bar then the link
>> line with autolinking becomes lld -lfoo another.o -lbar --whole-archive
>> main.o result: bar.o is not added to the link.
>>
>> Hopefully the above scenario demonstrates why I think that it becomes too
>> complicated to reason about the effects of autolinking with pcc's proposed
>> insertion scheme.
>>
>>
>>
>>> 5. The linker tries to add a library or relocatable object file from
>>>>>> each of the strings in a .autolink section by; first, handling the string
>>>>>> as if it was specified on the commandline; second, by looking for the
>>>>>> string in each of the library search paths in turn; third, by looking for a
>>>>>> lib<string>.a or lib<string>.so (depending on the current mode of the
>>>>>> linker) in each of the library search paths.
>>>>>>
>>>>>
>>>>> Is the second part necessary? "-l:foo" causes the linker to search for
>>>>> a file named "foo" in the library search path, so it seems that allowing
>>>>> the autolink string to look like ":foo" would satisfy this use case.
>>>>>
>>>>
>>>>
>>>> I worded the proposal to avoid mapping "comment lib" pragmas to
>>>> --library command line options. My reasons:
>>>>
>>>> 1. I find the requirement that the user put ':' in their lib strings
>>>> slightly awkward. It means that the source code is now coupled to a
>>>> GNU-style linker. So then this isn't merely an ELF linking proposal, it's a
>>>> proposal for ELF toolchains with GNU-like linkers (e.g. the arm linker
>>>> doesn't support the colon prefix
>>>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0474c/Cjahbdei.html
>>>> ).
>>>>
>>>> 2. The syntax is #pragma comment(lib, ...) not #pragma
>>>> linker-option(library, ...) i.e. the only thing this (frankly rather
>>>> bizarre) syntax definitely implies is that the argument is related to
>>>> libraries (and comments ¯\_(ツ)_/¯); it is a bit of a stretch to interpret
>>>> "comment lib" pragmas as mapping directly to "specifying an additional
>>>> --library command line option".
>>>>
>>>> AFAIK all linkers support two ways of specifying inputs; firstly,
>>>> directly on the command line; secondly, with an option with very similar
>>>> semantics to GNU's --library option. I choose a method of finding a input
>>>> files that encompasses both methods of specifying a library on the command
>>>> line. I think that this method is actually more intuitive than either the
>>>> method used by the linker script INPUT command or by --library. FWIW, I
>>>> looked into the history of the colon prefix. It was added in
>>>> https://www.sourceware.org/ml/binutils/2007-03/msg00421.html.
>>>> Unfortunately, the rationale given is that it was merely a port of a
>>>> vxworks linker extension. I couldn't trace the history any further than
>>>> that to find the actual design discussion. The linker script command INPUT
>>>> uses a different scheme and the command already had this search order 20
>>>> years ago, which is the earliest version of the GNU linker I have history
>>>> for; again, the rationale is not available.
>>>>
>>>>
>>>>> 6. A new command line option --no-llvm-autolink will tell LLD to
>>>>>> ignore the .autolink sections.
>>>>>>
>>>>>> Rationale for the above points:
>>>>>>
>>>>>> 1. Adding the autolinked inputs last makes the process simple to
>>>>>> understand from a developers perspective. All linkers are able to implement
>>>>>> this scheme.
>>>>>> 2. Error-ing for libraries that are not found seems like better
>>>>>> behavior than failing the link during symbol resolution.
>>>>>> 3. It seems useful for the user to be able to apply command line
>>>>>> options which will affect all of the autolinked input files. There is a
>>>>>> potential problem of surprise for developers, who might not realize that
>>>>>> these options would apply to the "invisible" autolinked input files;
>>>>>> however, despite the potential for surprise, this is easy for developers to
>>>>>> reason about and gives developers the control that they may require.
>>>>>> 4. Unlike on the command line it is probably easy to include the same
>>>>>> input file twice via pragmas and might be a pain to fix; think of
>>>>>> Third-party libraries supplied as binaries.
>>>>>> 5. This algorithm takes into account all of the different ways that
>>>>>> ELF linkers find input files. The different search methods are tried by the
>>>>>> linker in most obvious to least obvious order.
>>>>>> 6. I considered adding finer grained control over which .autolink
>>>>>> inputs were ignored (e.g. MSVC has /nodefaultlib:<library>); however, I
>>>>>> concluded that this is not necessary: if finer control is required
>>>>>> developers can recreate the same effect autolinking would have had using
>>>>>> command line options.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Peter
>>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190321/fc1044b7/attachment-0001.html>
    
    
More information about the llvm-dev
mailing list