[llvm-dev] RFC: ELF Autolinking

Rui Ueyama via llvm-dev llvm-dev at lists.llvm.org
Mon Mar 18 13:03:35 PDT 2019


On Fri, Mar 15, 2019 at 6:23 AM bd1976 llvm <bd1976llvm at gmail.com> wrote:

>
>
> On Thu, Mar 14, 2019 at 6:43 PM bd1976 llvm <bd1976llvm at gmail.com> wrote:
>
>>
>>
>> On Thu, Mar 14, 2019 at 5:58 PM Rui Ueyama <ruiu at google.com> wrote:
>>
>>> On Thu, Mar 14, 2019 at 9:45 AM bd1976 llvm via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> On Thu, Mar 14, 2019 at 3:32 PM Peter Smith <peter.smith at linaro.org>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I've put some comments on the proposal inline. Having to had to debug
>>>>> library selection problems where all the libraries are visible on the
>>>>> linker command line, I would prefer if people didn't embed difficult
>>>>> to find directives in object files, but I'm guessing in some languages
>>>>> this is the natural way of adding libraries.
>>>>>
>>>>> On Thu, 14 Mar 2019 at 13:08, bd1976 llvm via llvm-dev
>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>> >
>>>>> > At Sony we offer autolinking as a feature in our ELF toolchain. We
>>>>> would like to see full support for this feature upstream as there is
>>>>> anecdotal evidence that it would find use beyond Sony.
>>>>> >
>>>>>
>>>>> I've not got any use of the existing code. Personally I've not come
>>>>> across anyone wanting this type of feature, but that is also anecdotal
>>>>> on my part.
>>>>>
>>>>> >
>>>>> > For ELF we need limited autolinking support. Specifically, we only
>>>>> need support for "comment lib" pragmas (
>>>>> https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017)
>>>>> in C/C++ e.g. #pragma comment(lib, "foo"). My suggestion that we keep the
>>>>> implementation as lean as possible.
>>>>> >
>>>>> > Principles to guide the implementation:
>>>>> > - Developers should be able to easily understand autolinking
>>>>> behavior.
>>>>> > - Developers should be able to override autolinking from the linker
>>>>> command line.
>>>>> > - Inputs specified via pragmas should be handled in a general way to
>>>>> allow the same source code to work in different environments.
>>>>> >
>>>>> > I would like to propose that we focus on autolinking exclusively and
>>>>> that we divorce the implementation from the idea of "linker options" which,
>>>>> by nature, would tie source code to the vagaries of particular linkers. I
>>>>> don't see much value in supporting other linker operations so I suggest
>>>>> that the binary representation be a mergable string section (SHF_MERGE,
>>>>> SHF_STRINGS), called .autolink, with custom type SHT_LLVM_AUTOLINK
>>>>> (0x6fff4c04), and SHF_EXCLUDE set (to avoid the contents appearing in the
>>>>> output). The compiler can form this section by concatenating the arguments
>>>>> of the "comment lib" pragmas in the order they are encountered. Partial
>>>>> (-r, -Ur) links can be handled by concatenating .autolink sections with the
>>>>> normal mergeable string section rules. The current .linker-options can
>>>>> remain (or be removed); but, "comment lib" pragmas for ELF should be
>>>>> lowered to .autolink not to .linker-options. This makes sense as there is
>>>>> no linker option that "comment lib" pragmas map directly to. As an example,
>>>>> #pragma comment(lib, "foo") would result in:
>>>>> >
>>>>> > .section ".autolink","eMS", at llvm_autolink,1
>>>>> >         .asciz "foo"
>>>>> >
>>>>> > For LTO, equivalent information to the contents of a the .autolink
>>>>> section will be written to the IRSymtab so that it is available to the
>>>>> linker for symbol resolution.
>>>>> >
>>>>>
>>>>> I'm not sure I understand the bit about "for symbol resolution". I
>>>>> think that what you mean is that you will encode the autolink section
>>>>> using symbols instead of as a section, and the linker is expected to
>>>>> extract this when it reads the symbol table?
>>>>>
>>>>>
>>>> Whoops... might have used a bit of a colloquialism there; sorry. All I
>>>> mean is that there will be a method on the IRSymtab that LLD can use to
>>>> retrieve the same set of strings that would be written into the the
>>>> .autolink section of the relocatable object files by the backend.
>>>>
>>>>
>>>>> > The linker will process the .autolink strings in the following way:
>>>>> >
>>>>> > 1. Inputs from the .autolink sections of a relocatable object file
>>>>> are added when the linker decides to include that file (which could itself
>>>>> be in a library) in the link. Autolinked inputs behave as if they were
>>>>> appended to the command line as a group after all other options. As a
>>>>> consequence the set of autolinked libraries are searched last to resolve
>>>>> symbols.
>>>>> > 2. It is an error if a file cannot be found for a given string.
>>>>> > 3. Any command line options in effect at the end of the command line
>>>>> parsing apply to autolinked inputs, e.g. --whole-archive.
>>>>>
>>>>> I've not got any experience of autolinking as a user, so I'm
>>>>> struggling a bit with this one. I'm guessing that autolinking is
>>>>> useful because someone can do the equivalent of #include <library.h>
>>>>> and #pragma comment lib "library.so" in the same place without having
>>>>> to fight the build system.
>>>>
>>>>
>>>> Right. Consider that many codebases have multiple build configurations
>>>> and the linker needs to be given the correct version of a library to use
>>>> for the particular build configuration. This is often easier to do using
>>>> the preprocessor than in the build system. Also, if a program is dependent
>>>> on an external library, autolinking allows the library writer to reorganize
>>>> how that library is structured transparently to the users of the library.
>>>> There are notes about utility in
>>>> https://stackoverflow.com/questions/1685206/pragma-commentlib-xxx-lib-equivalent-under-linux
>>>> and
>>>> https://stackoverflow.com/questions/3851956/whats-pragma-comment-lib-lib-glut32-lib?noredirect=1&lq=1
>>>> .
>>>>
>>>>
>>>>> I'm less convinced about --whole-archive as
>>>>> I think this tends to be a way of structuring the build and would be
>>>>> best made explicit in the build system. Moreover, what if someone
>>>>> wants to not use --whole-archive, for their autolink, but one already
>>>>> exists.
>>>>
>>>>
>>>> Then they can specify --no-whole-archive on the end of the command
>>>> line, no?
>>>>
>>>>
>>>>> This could be quite difficult to check with a large project.
>>>>> Personally I'd have the user be explicit in the .autolink whether they
>>>>> were intending it to be whole-archive or not.
>>>>>
>>>>
>>>> I was hoping to avoid this as I want to avoid getting into how to
>>>> specify linker specific options in the frontend. If we dislike the idea
>>>> that the state of the command line parser at the end of the linker command
>>>> line affects the autolinked libraries then I would rather go for a scheme
>>>> in which the default state of the command line parser applies when linking
>>>> the autolinked libraries; however, that seems harder to implement in LLD
>>>> and gives the user less control over autolinking.
>>>>
>>>
>>> I think that handling .autolink'ed files in the default state is
>>> simpler, and it doesn't seem too hard to implement.
>>>
>>
>> Right.. definitely possible to implement. So the trade offs are that it
>> is possibly confusing if options like --whole-archive start applying to the
>> "invisible" autolinked inputs. OTOH why not allow command line options to
>> affect the autolinked inputs? It gives developers some more control at no
>> cost (apart form the possible confusion).
>>
>>
>>>
>>> The other option is to handle autolinked libraries as soon as we find
>>> them, so that if foo.o autolinks libbar, the linker would act as if foo.o
>>> in the command line is followed by -lbar. I'd think that's not too bad or
>>> arguably more straightforward semantics than autolinking everything all at
>>> once at the end.
>>>
>>
>> So I played around with this idea a bit. Some background info:
>>
>> MSVC searches libraries added via "comment lib" pragmas last, after
>> searching all of the libraries specified on the command line; however,
>> symbols that are unresolved when bringing in an object file from a library
>> are searched for in that library first (
>> https://docs.microsoft.com/en-us/cpp/build/reference/link-input-files?view=vs-2017
>> ).
>>
>> In the upstream discussion for autolinking, Cary Coutant offered the
>> following as a good compromise for traditional ELF linkers (
>> http://lists.llvm.org/pipermail/llvm-dev/2018-January/120382.html.):
>>
>> """I think what would work is to insert each requested object or shared
>> library into the link order immediately after the object that requests
>> it, but only if the object hasn't already been inserted and isn't
>> already listed on the command line (i.e., we won't try to load the
>> same file twice); and to search each requested archive library
>> immediately after each object that requests it (of course, because of
>> how library searching works, we would load a given archive member once
>> at most). With this method, libm would be searched after both a.o and
>> b.o, so we'd load any members needed by a.o before b.o, and any
>> remaining members needed by b.o before c.o."""
>>
>> The problem with what your suggesting is that with the GNU linkers it is
>> always possible to define "where" in the command line parsing you are.
>> However for MSVC or LLD it is not always possible.. think of a object file
>> in a library that autolinks foo.a that gets pulled into the link (by a
>> undefined symbol) much later on in the link order. My RFC is careful to try
>> to set out a scheme that all linkers can implement (as much as is possible).
>>
>>>
>>>
>>>>> > 4. Duplicate autolinked inputs are ignored.
>>>>>
>>>>> If we take the issue of --whole-archive off the table does it matter
>>>>> that there are duplicate libraries? Unresolved symbols will match
>>>>> against the first library.
>>>>
>>>>
>>>> It doesn't matter for libraries in LLD; but, it is important for object
>>>> files. I think that this mechanism should be usable for object files an
>>>> libraries. This is common in ELF linkers - for example the --library
>>>> command line option can be used to link object files.
>>>>
>>>>>
>>> Do you actually often link .o file using -l? It seems a bit weird use of
>>> the option. To me, it seems better to limit the ability of autolinking to
>>> link against .so or .a.
>>>
>>>
>> I don't personally but it does seem useful to be able to find .o files on
>> the library search paths.
>>
>>
>
> Rui - I'm sure you know everything about MSVC linking already! For others
> benefit though, MSVC only allows loading of libraries via "comment lib"
> pragmas. It rejects .obj files.
>

I'd think that's a better approach than allowing linking against .o using
autolinking. I can't think of a use case of autolinking against .o, and if
you really need it, you can easily create an archive containing a single
object file.


>
> C:\temp\library_semantics>more msvc_foo.cint foo() {return 10;}
>
> C:\temp\library_semantics>cl msvc_foo.c /c
> msvc_foo.c
>
> C:\temp\library_semantics>more msvc.c
> #pragma comment(lib, "msvc_foo.obj")int foo ();int main () {return foo();}
>
> C:\temp\library_semantics>cl msvc.c
> Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24213.1 for x64
> Copyright (C) Microsoft Corporation.  All rights reserved.
>
> msvc.c
> Microsoft (R) Incremental Linker Version 14.00.24213.1
> Copyright (C) Microsoft Corporation.  All rights reserved.
>
> /out:msvc.exe
> msvc.obj
> msvc_foo.obj : warning LNK4003: invalid library format; library ignored
> msvc.obj : error LNK2019: unresolved external symbol foo referenced in function main
> msvc.exe : fatal error LNK1120: 1 unresolved externals
>
> C:\temp\library_semantics>lib /out:msvc_foo.lib msvc_foo.obj
> Microsoft (R) Library Manager Version 14.00.24213.1
> Copyright (C) Microsoft Corporation.  All rights reserved.
>
> C:\temp\library_semantics>more msvc.c
> #pragma comment(lib, "msvc_foo.lib")int foo ();int main (){return foo();}
>
> C:\temp\library_semantics>cl msvc.c /link /verbose | grep msvc
> msvc.c
> /out:msvc.exe
> msvc.obj
> Processed /DEFAULTLIB:msvc_foo.lib
>     Searching msvc_foo.lib:
>         Referenced in msvc.obj
>         Loaded msvc_foo.lib(msvc_foo.obj)
>  Processed /DISALLOWLIB:msvcrt.lib
>  Processed /DISALLOWLIB:msvcrtd.lib
>     Searching msvc_foo.lib:
>     Searching msvc_foo.lib:
>     Searching msvc_foo.lib:
>      msvc.obj
>      msvc_foo.lib(msvc_foo.obj)
>
>
> Other interesting MSVC behaviour:
>
> MSVC forms the library name to search for based on the file extension. An
> interesting difference is that on windows import libraries and static
> archives both have the same naming convention of <basename>.lib. Whereas on
> Unix dynamic libraries are conventionally named <basename>.so and static
> archives are lib<basename>.a.
>
> #pragma comment(lib, "winmm") -> Searches for "winmm.lib" (doesn't search
> for "winmm")
> #pragma comment(lib, "winmm.lib") -> Searches for "winmm.lib" (doesn't
> search for "winmm.lib.lib")
> #pragma comment(lib, "winmm.lix") -> Searches for "winmm.lix" (doesn't
> search for "winmm.lix.lib")
>
> MSVC allows specifying libraries on the command line as just file names or
> by using the /DEFAULTLIB option. In both cases the rules for locating the
> library are the same. If a path is specified with the library name, LINK
> searches for the library in that directory. If no path is specified, LINK
> looks first in the directory that LINK is running from, and then in any
> directories specified in the LIB environment variable, see :
> https://docs.microsoft.com/en-us/cpp/build/reference/dot-lib-files-as-linker-input?view=vs-2017.
> Additionally, LINK will search for any /LIBPATH paths before those
> specified in the LIB environment variable, see:
> https://docs.microsoft.com/en-us/cpp/build/reference/libpath-additional-libpath?view=vs-2017.
> LINK handles libraries specified via "comment lib" pragmas just as if you
> had named them at on the command line, see:
> https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017
> .
>
> MSVC rules for resolving symbols from libraries: A library specified with
> /DEFAULTLIB is searched after libraries specified explicitly on the command
> line and before default libraries named in .obj files (see
> https://docs.microsoft.com/en-us/cpp/build/reference/nodefaultlib-ignore-libraries?view=vs-2017
> ).
>
> MSVC allows passing not only libraries to the linker via pragams but also
> a subset of the linkers command line options (
> https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp). In
> addition to the documented options MSVC also accepts some undocumented
> options. One of these is the /DISALLOWLIB which allows an object file to
> state that it is incompatible with a given library, see:
> https://stackoverflow.com/questions/761394/what-does-the-disallowlib-message-mean-in-vc-linker-output
> and
> https://stackoverflow.com/questions/3007312/resolving-lnk4098-defaultlib-msvcrt-conflicts-with
> .
>
> One of the options supported is /DEFAULTLIB. This means you can specify
> libraries via pragmas with either #pragma comment(lib, <library>) or
> #pragma comment(linker, "/DEFAULTLIB:<library>").
>
> MSVC has the "/NODEFAULTLIB" option which ignores any /DEFAULTLIB options
> from object files or the command-line. You can also ignore specific
> libraries, with "/NODEFAULTLIB:name.lib".
>
> Both Gold and GNU-ld allow loading of non-library files via -l/--library
> options; but, MSVC only allows adding libraries via its equivalent of the
> -l command:
>
> C:\temp\library_semantics>more msvc_foo.c
> int foo() {return 10;}
>
> C:\temp\library_semantics>cl msvc_foo.c /c
>
> C:\temp\library_semantics>lib /out:msvc_foo.lib msvc_foo.obj
>
> C:\temp\library_semantics>type msvc_main.c
> void main(){}
>
> C:\temp\library_semantics>cl msvc_main.c /link /DEFAULTLIB:foo.obj
>
> /out:msvc_main.exe
> /DEFAULTLIB:foo.obj
> msvc_main.obj
> foo.obj : warning LNK4003: invalid library format; library ignored
>
>
> MSVC also ignores duplicate .objs on the command line:
>
> c:\temp\library_semantics>cl msvc.obj
> /out:msvc.exe
> msvc.obj
>
> c:\temp\library_semantics>cl msvc.obj msvc.obj
> /out:msvc.exe
> msvc.obj
> msvc.obj
> msvc.obj : warning LNK4042: object specified more than once; extras ignored
>
>
> I guess it might make a difference if this
>>>>> feature is implemented in ld.lld and ld.gold, where you'd have to wrap
>>>>> the libraries in a start-group, end-group, but is this likely to
>>>>> happen?
>>>>>
>>>>
>>>> I would like the design to be such that it could be implemented by GNU.
>>>>
>>>>
>>>>>
>>>>> > 5. The linker tries to add a library or relocatable object file from
>>>>> each of the strings in a .autolink section by; first, handling the string
>>>>> as if it was specified on the commandline; second, by looking for the
>>>>> string in each of the library search paths in turn; third, by looking for a
>>>>> lib<string>.a or lib<string>.so (depending on the current mode of the
>>>>> linker) in each of the library search paths.
>>>>>
>>>>> There is some precedent for including files and libraries from
>>>>> linkerscripts
>>>>> https://sourceware.org/binutils/docs/ld/File-Commands.html#File-Commands
>>>>> , these distinguish between "-lfile" and "file". Would this be a
>>>>> better fit for a ld.bfd interface compatible linker?
>>>>>
>>>>>
>>>> I was hoping to avoid GNUism's and use a "general" mechanism. MSVC
>>>> source code compatibility is a usecase.
>>>>
>>>>
>>>>> > 6. A new command line option --no-llvm-autolink will tell LLD to
>>>>> ignore the .autolink sections.
>>>>>
>>>>> Personally I would have thought --no-llvm-autolink would error if it
>>>>> found a .autolink section, on the grounds that I wanted all the
>>>>> libraries to be defined on the command-line or linker script rather
>>>>> than hidden in object files. I would have thought ignoring the
>>>>> autolink sections would in most cases result in undefined symbols. If
>>>>> there is a use case for it, perhaps --ignore-llvm-autolink.
>>>>>
>>>>>
>>>> The usecase that I had in mind is that you need to override
>>>> autolinking. To do so you tell the linker to ignore the embedded
>>>> autolinking information and construct an equivalent command line. I think
>>>> your proposed  --ignore-llvm-autolink is a better name for this option
>>>> given the intended semantics.
>>>>
>>>>
>>>>> > Rationale for the above points:
>>>>> >
>>>>> > 1. Adding the autolinked inputs last makes the process simple to
>>>>> understand from a developers perspective. All linkers are able to implement
>>>>> this scheme.
>>>>> > 2. Error-ing for libraries that are not found seems like better
>>>>> behavior than failing the link during symbol resolution.
>>>>> > 3. It seems useful for the user to be able to apply command line
>>>>> options which will affect all of the autolinked input files. There is a
>>>>> potential problem of surprise for developers, who might not realize that
>>>>> these options would apply to the "invisible" autolinked input files;
>>>>> however, despite the potential for surprise, this is easy for developers to
>>>>> reason about and gives developers the control that they may require.
>>>>> > 4. Unlike on the command line it is probably easy to include the
>>>>> same input file twice via pragmas and might be a pain to fix; think of
>>>>> Third-party libraries supplied as binaries.
>>>>> > 5. This algorithm takes into account all of the different ways that
>>>>> ELF linkers find input files. The different search methods are tried by the
>>>>> linker in most obvious to least obvious order.
>>>>> > 6. I considered adding finer grained control over which .autolink
>>>>> inputs were ignored (e.g. MSVC has /nodefaultlib:<library>); however, I
>>>>> concluded that this is not necessary: if finer control is required
>>>>> developers can recreate the same effect autolinking would have had using
>>>>> command line options.
>>>>> >
>>>>> > Thoughts?
>>>>> >
>>>>> > _______________________________________________
>>>>> > LLVM Developers mailing list
>>>>> > llvm-dev at lists.llvm.org
>>>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190318/31c675bf/attachment-0001.html>


More information about the llvm-dev mailing list