[llvm-dev] RFC: ELF Autolinking

bd1976 llvm via llvm-dev llvm-dev at lists.llvm.org
Tue Mar 19 08:17:02 PDT 2019


I'm happy with this. The syntax is pragma comment(*lib*, ...) after all...
so it meets expectations if .o files are rejected. This also promotes
compatibility with MSVC codebases.

On Mon, Mar 18, 2019 at 8:04 PM Rui Ueyama <ruiu at google.com> wrote:

> On Fri, Mar 15, 2019 at 6:23 AM bd1976 llvm <bd1976llvm at gmail.com> wrote:
>
>>
>>
>> On Thu, Mar 14, 2019 at 6:43 PM bd1976 llvm <bd1976llvm at gmail.com> wrote:
>>
>>>
>>>
>>> On Thu, Mar 14, 2019 at 5:58 PM Rui Ueyama <ruiu at google.com> wrote:
>>>
>>>> On Thu, Mar 14, 2019 at 9:45 AM bd1976 llvm via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> On Thu, Mar 14, 2019 at 3:32 PM Peter Smith <peter.smith at linaro.org>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I've put some comments on the proposal inline. Having to had to debug
>>>>>> library selection problems where all the libraries are visible on the
>>>>>> linker command line, I would prefer if people didn't embed difficult
>>>>>> to find directives in object files, but I'm guessing in some languages
>>>>>> this is the natural way of adding libraries.
>>>>>>
>>>>>> On Thu, 14 Mar 2019 at 13:08, bd1976 llvm via llvm-dev
>>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>>> >
>>>>>> > At Sony we offer autolinking as a feature in our ELF toolchain. We
>>>>>> would like to see full support for this feature upstream as there is
>>>>>> anecdotal evidence that it would find use beyond Sony.
>>>>>> >
>>>>>>
>>>>>> I've not got any use of the existing code. Personally I've not come
>>>>>> across anyone wanting this type of feature, but that is also anecdotal
>>>>>> on my part.
>>>>>>
>>>>>> >
>>>>>> > For ELF we need limited autolinking support. Specifically, we only
>>>>>> need support for "comment lib" pragmas (
>>>>>> https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017)
>>>>>> in C/C++ e.g. #pragma comment(lib, "foo"). My suggestion that we keep the
>>>>>> implementation as lean as possible.
>>>>>> >
>>>>>> > Principles to guide the implementation:
>>>>>> > - Developers should be able to easily understand autolinking
>>>>>> behavior.
>>>>>> > - Developers should be able to override autolinking from the linker
>>>>>> command line.
>>>>>> > - Inputs specified via pragmas should be handled in a general way
>>>>>> to allow the same source code to work in different environments.
>>>>>> >
>>>>>> > I would like to propose that we focus on autolinking exclusively
>>>>>> and that we divorce the implementation from the idea of "linker options"
>>>>>> which, by nature, would tie source code to the vagaries of particular
>>>>>> linkers. I don't see much value in supporting other linker operations so I
>>>>>> suggest that the binary representation be a mergable string section
>>>>>> (SHF_MERGE, SHF_STRINGS), called .autolink, with custom type
>>>>>> SHT_LLVM_AUTOLINK (0x6fff4c04), and SHF_EXCLUDE set (to avoid the contents
>>>>>> appearing in the output). The compiler can form this section by
>>>>>> concatenating the arguments of the "comment lib" pragmas in the order they
>>>>>> are encountered. Partial (-r, -Ur) links can be handled by concatenating
>>>>>> .autolink sections with the normal mergeable string section rules. The
>>>>>> current .linker-options can remain (or be removed); but, "comment lib"
>>>>>> pragmas for ELF should be lowered to .autolink not to .linker-options. This
>>>>>> makes sense as there is no linker option that "comment lib" pragmas map
>>>>>> directly to. As an example, #pragma comment(lib, "foo") would result in:
>>>>>> >
>>>>>> > .section ".autolink","eMS", at llvm_autolink,1
>>>>>> >         .asciz "foo"
>>>>>> >
>>>>>> > For LTO, equivalent information to the contents of a the .autolink
>>>>>> section will be written to the IRSymtab so that it is available to the
>>>>>> linker for symbol resolution.
>>>>>> >
>>>>>>
>>>>>> I'm not sure I understand the bit about "for symbol resolution". I
>>>>>> think that what you mean is that you will encode the autolink section
>>>>>> using symbols instead of as a section, and the linker is expected to
>>>>>> extract this when it reads the symbol table?
>>>>>>
>>>>>>
>>>>> Whoops... might have used a bit of a colloquialism there; sorry. All I
>>>>> mean is that there will be a method on the IRSymtab that LLD can use to
>>>>> retrieve the same set of strings that would be written into the the
>>>>> .autolink section of the relocatable object files by the backend.
>>>>>
>>>>>
>>>>>> > The linker will process the .autolink strings in the following way:
>>>>>> >
>>>>>> > 1. Inputs from the .autolink sections of a relocatable object file
>>>>>> are added when the linker decides to include that file (which could itself
>>>>>> be in a library) in the link. Autolinked inputs behave as if they were
>>>>>> appended to the command line as a group after all other options. As a
>>>>>> consequence the set of autolinked libraries are searched last to resolve
>>>>>> symbols.
>>>>>> > 2. It is an error if a file cannot be found for a given string.
>>>>>> > 3. Any command line options in effect at the end of the command
>>>>>> line parsing apply to autolinked inputs, e.g. --whole-archive.
>>>>>>
>>>>>> I've not got any experience of autolinking as a user, so I'm
>>>>>> struggling a bit with this one. I'm guessing that autolinking is
>>>>>> useful because someone can do the equivalent of #include <library.h>
>>>>>> and #pragma comment lib "library.so" in the same place without having
>>>>>> to fight the build system.
>>>>>
>>>>>
>>>>> Right. Consider that many codebases have multiple build configurations
>>>>> and the linker needs to be given the correct version of a library to use
>>>>> for the particular build configuration. This is often easier to do using
>>>>> the preprocessor than in the build system. Also, if a program is dependent
>>>>> on an external library, autolinking allows the library writer to reorganize
>>>>> how that library is structured transparently to the users of the library.
>>>>> There are notes about utility in
>>>>> https://stackoverflow.com/questions/1685206/pragma-commentlib-xxx-lib-equivalent-under-linux
>>>>> and
>>>>> https://stackoverflow.com/questions/3851956/whats-pragma-comment-lib-lib-glut32-lib?noredirect=1&lq=1
>>>>> .
>>>>>
>>>>>
>>>>>> I'm less convinced about --whole-archive as
>>>>>> I think this tends to be a way of structuring the build and would be
>>>>>> best made explicit in the build system. Moreover, what if someone
>>>>>> wants to not use --whole-archive, for their autolink, but one already
>>>>>> exists.
>>>>>
>>>>>
>>>>> Then they can specify --no-whole-archive on the end of the command
>>>>> line, no?
>>>>>
>>>>>
>>>>>> This could be quite difficult to check with a large project.
>>>>>> Personally I'd have the user be explicit in the .autolink whether they
>>>>>> were intending it to be whole-archive or not.
>>>>>>
>>>>>
>>>>> I was hoping to avoid this as I want to avoid getting into how to
>>>>> specify linker specific options in the frontend. If we dislike the idea
>>>>> that the state of the command line parser at the end of the linker command
>>>>> line affects the autolinked libraries then I would rather go for a scheme
>>>>> in which the default state of the command line parser applies when linking
>>>>> the autolinked libraries; however, that seems harder to implement in LLD
>>>>> and gives the user less control over autolinking.
>>>>>
>>>>
>>>> I think that handling .autolink'ed files in the default state is
>>>> simpler, and it doesn't seem too hard to implement.
>>>>
>>>
>>> Right.. definitely possible to implement. So the trade offs are that it
>>> is possibly confusing if options like --whole-archive start applying to the
>>> "invisible" autolinked inputs. OTOH why not allow command line options to
>>> affect the autolinked inputs? It gives developers some more control at no
>>> cost (apart form the possible confusion).
>>>
>>>
>>>>
>>>> The other option is to handle autolinked libraries as soon as we find
>>>> them, so that if foo.o autolinks libbar, the linker would act as if foo.o
>>>> in the command line is followed by -lbar. I'd think that's not too bad or
>>>> arguably more straightforward semantics than autolinking everything all at
>>>> once at the end.
>>>>
>>>
>>> So I played around with this idea a bit. Some background info:
>>>
>>> MSVC searches libraries added via "comment lib" pragmas last, after
>>> searching all of the libraries specified on the command line; however,
>>> symbols that are unresolved when bringing in an object file from a library
>>> are searched for in that library first (
>>> https://docs.microsoft.com/en-us/cpp/build/reference/link-input-files?view=vs-2017
>>> ).
>>>
>>> In the upstream discussion for autolinking, Cary Coutant offered the
>>> following as a good compromise for traditional ELF linkers (
>>> http://lists.llvm.org/pipermail/llvm-dev/2018-January/120382.html.):
>>>
>>> """I think what would work is to insert each requested object or shared
>>> library into the link order immediately after the object that requests
>>> it, but only if the object hasn't already been inserted and isn't
>>> already listed on the command line (i.e., we won't try to load the
>>> same file twice); and to search each requested archive library
>>> immediately after each object that requests it (of course, because of
>>> how library searching works, we would load a given archive member once
>>> at most). With this method, libm would be searched after both a.o and
>>> b.o, so we'd load any members needed by a.o before b.o, and any
>>> remaining members needed by b.o before c.o."""
>>>
>>> The problem with what your suggesting is that with the GNU linkers it is
>>> always possible to define "where" in the command line parsing you are.
>>> However for MSVC or LLD it is not always possible.. think of a object file
>>> in a library that autolinks foo.a that gets pulled into the link (by a
>>> undefined symbol) much later on in the link order. My RFC is careful to try
>>> to set out a scheme that all linkers can implement (as much as is possible).
>>>
>>>>
>>>>
>>>>>> > 4. Duplicate autolinked inputs are ignored.
>>>>>>
>>>>>> If we take the issue of --whole-archive off the table does it matter
>>>>>> that there are duplicate libraries? Unresolved symbols will match
>>>>>> against the first library.
>>>>>
>>>>>
>>>>> It doesn't matter for libraries in LLD; but, it is important for
>>>>> object files. I think that this mechanism should be usable for object files
>>>>> an libraries. This is common in ELF linkers - for example the --library
>>>>> command line option can be used to link object files.
>>>>>
>>>>>>
>>>> Do you actually often link .o file using -l? It seems a bit weird use
>>>> of the option. To me, it seems better to limit the ability of autolinking
>>>> to link against .so or .a.
>>>>
>>>>
>>> I don't personally but it does seem useful to be able to find .o files
>>> on the library search paths.
>>>
>>>
>>
>> Rui - I'm sure you know everything about MSVC linking already! For others
>> benefit though, MSVC only allows loading of libraries via "comment lib"
>> pragmas. It rejects .obj files.
>>
>
> I'd think that's a better approach than allowing linking against .o using
> autolinking. I can't think of a use case of autolinking against .o, and if
> you really need it, you can easily create an archive containing a single
> object file.
>
>
>>
>> C:\temp\library_semantics>more msvc_foo.cint foo() {return 10;}
>>
>> C:\temp\library_semantics>cl msvc_foo.c /c
>> msvc_foo.c
>>
>> C:\temp\library_semantics>more msvc.c
>> #pragma comment(lib, "msvc_foo.obj")int foo ();int main () {return foo();}
>>
>> C:\temp\library_semantics>cl msvc.c
>> Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24213.1 for x64
>> Copyright (C) Microsoft Corporation.  All rights reserved.
>>
>> msvc.c
>> Microsoft (R) Incremental Linker Version 14.00.24213.1
>> Copyright (C) Microsoft Corporation.  All rights reserved.
>>
>> /out:msvc.exe
>> msvc.obj
>> msvc_foo.obj : warning LNK4003: invalid library format; library ignored
>> msvc.obj : error LNK2019: unresolved external symbol foo referenced in function main
>> msvc.exe : fatal error LNK1120: 1 unresolved externals
>>
>> C:\temp\library_semantics>lib /out:msvc_foo.lib msvc_foo.obj
>> Microsoft (R) Library Manager Version 14.00.24213.1
>> Copyright (C) Microsoft Corporation.  All rights reserved.
>>
>> C:\temp\library_semantics>more msvc.c
>> #pragma comment(lib, "msvc_foo.lib")int foo ();int main (){return foo();}
>>
>> C:\temp\library_semantics>cl msvc.c /link /verbose | grep msvc
>> msvc.c
>> /out:msvc.exe
>> msvc.obj
>> Processed /DEFAULTLIB:msvc_foo.lib
>>     Searching msvc_foo.lib:
>>         Referenced in msvc.obj
>>         Loaded msvc_foo.lib(msvc_foo.obj)
>>  Processed /DISALLOWLIB:msvcrt.lib
>>  Processed /DISALLOWLIB:msvcrtd.lib
>>     Searching msvc_foo.lib:
>>     Searching msvc_foo.lib:
>>     Searching msvc_foo.lib:
>>      msvc.obj
>>      msvc_foo.lib(msvc_foo.obj)
>>
>>
>> Other interesting MSVC behaviour:
>>
>> MSVC forms the library name to search for based on the file extension. An
>> interesting difference is that on windows import libraries and static
>> archives both have the same naming convention of <basename>.lib. Whereas on
>> Unix dynamic libraries are conventionally named <basename>.so and static
>> archives are lib<basename>.a.
>>
>> #pragma comment(lib, "winmm") -> Searches for "winmm.lib" (doesn't search
>> for "winmm")
>> #pragma comment(lib, "winmm.lib") -> Searches for "winmm.lib" (doesn't
>> search for "winmm.lib.lib")
>> #pragma comment(lib, "winmm.lix") -> Searches for "winmm.lix" (doesn't
>> search for "winmm.lix.lib")
>>
>> MSVC allows specifying libraries on the command line as just file names
>> or by using the /DEFAULTLIB option. In both cases the rules for locating
>> the library are the same. If a path is specified with the library name,
>> LINK searches for the library in that directory. If no path is specified,
>> LINK looks first in the directory that LINK is running from, and then in
>> any directories specified in the LIB environment variable, see :
>> https://docs.microsoft.com/en-us/cpp/build/reference/dot-lib-files-as-linker-input?view=vs-2017.
>> Additionally, LINK will search for any /LIBPATH paths before those
>> specified in the LIB environment variable, see:
>> https://docs.microsoft.com/en-us/cpp/build/reference/libpath-additional-libpath?view=vs-2017.
>> LINK handles libraries specified via "comment lib" pragmas just as if you
>> had named them at on the command line, see:
>> https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017
>> .
>>
>> MSVC rules for resolving symbols from libraries: A library specified with
>> /DEFAULTLIB is searched after libraries specified explicitly on the command
>> line and before default libraries named in .obj files (see
>> https://docs.microsoft.com/en-us/cpp/build/reference/nodefaultlib-ignore-libraries?view=vs-2017
>> ).
>>
>> MSVC allows passing not only libraries to the linker via pragams but also
>> a subset of the linkers command line options (
>> https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp). In
>> addition to the documented options MSVC also accepts some undocumented
>> options. One of these is the /DISALLOWLIB which allows an object file to
>> state that it is incompatible with a given library, see:
>> https://stackoverflow.com/questions/761394/what-does-the-disallowlib-message-mean-in-vc-linker-output
>> and
>> https://stackoverflow.com/questions/3007312/resolving-lnk4098-defaultlib-msvcrt-conflicts-with
>> .
>>
>> One of the options supported is /DEFAULTLIB. This means you can specify
>> libraries via pragmas with either #pragma comment(lib, <library>) or
>> #pragma comment(linker, "/DEFAULTLIB:<library>").
>>
>> MSVC has the "/NODEFAULTLIB" option which ignores any /DEFAULTLIB options
>> from object files or the command-line. You can also ignore specific
>> libraries, with "/NODEFAULTLIB:name.lib".
>>
>> Both Gold and GNU-ld allow loading of non-library files via -l/--library
>> options; but, MSVC only allows adding libraries via its equivalent of the
>> -l command:
>>
>> C:\temp\library_semantics>more msvc_foo.c
>> int foo() {return 10;}
>>
>> C:\temp\library_semantics>cl msvc_foo.c /c
>>
>> C:\temp\library_semantics>lib /out:msvc_foo.lib msvc_foo.obj
>>
>> C:\temp\library_semantics>type msvc_main.c
>> void main(){}
>>
>> C:\temp\library_semantics>cl msvc_main.c /link /DEFAULTLIB:foo.obj
>>
>> /out:msvc_main.exe
>> /DEFAULTLIB:foo.obj
>> msvc_main.obj
>> foo.obj : warning LNK4003: invalid library format; library ignored
>>
>>
>> MSVC also ignores duplicate .objs on the command line:
>>
>> c:\temp\library_semantics>cl msvc.obj
>> /out:msvc.exe
>> msvc.obj
>>
>> c:\temp\library_semantics>cl msvc.obj msvc.obj
>> /out:msvc.exe
>> msvc.obj
>> msvc.obj
>> msvc.obj : warning LNK4042: object specified more than once; extras ignored
>>
>>
>> I guess it might make a difference if this
>>>>>> feature is implemented in ld.lld and ld.gold, where you'd have to wrap
>>>>>> the libraries in a start-group, end-group, but is this likely to
>>>>>> happen?
>>>>>>
>>>>>
>>>>> I would like the design to be such that it could be implemented by GNU.
>>>>>
>>>>>
>>>>>>
>>>>>> > 5. The linker tries to add a library or relocatable object file
>>>>>> from each of the strings in a .autolink section by; first, handling the
>>>>>> string as if it was specified on the commandline; second, by looking for
>>>>>> the string in each of the library search paths in turn; third, by looking
>>>>>> for a lib<string>.a or lib<string>.so (depending on the current mode of the
>>>>>> linker) in each of the library search paths.
>>>>>>
>>>>>> There is some precedent for including files and libraries from
>>>>>> linkerscripts
>>>>>> https://sourceware.org/binutils/docs/ld/File-Commands.html#File-Commands
>>>>>> , these distinguish between "-lfile" and "file". Would this be a
>>>>>> better fit for a ld.bfd interface compatible linker?
>>>>>>
>>>>>>
>>>>> I was hoping to avoid GNUism's and use a "general" mechanism. MSVC
>>>>> source code compatibility is a usecase.
>>>>>
>>>>>
>>>>>> > 6. A new command line option --no-llvm-autolink will tell LLD to
>>>>>> ignore the .autolink sections.
>>>>>>
>>>>>> Personally I would have thought --no-llvm-autolink would error if it
>>>>>> found a .autolink section, on the grounds that I wanted all the
>>>>>> libraries to be defined on the command-line or linker script rather
>>>>>> than hidden in object files. I would have thought ignoring the
>>>>>> autolink sections would in most cases result in undefined symbols. If
>>>>>> there is a use case for it, perhaps --ignore-llvm-autolink.
>>>>>>
>>>>>>
>>>>> The usecase that I had in mind is that you need to override
>>>>> autolinking. To do so you tell the linker to ignore the embedded
>>>>> autolinking information and construct an equivalent command line. I think
>>>>> your proposed  --ignore-llvm-autolink is a better name for this option
>>>>> given the intended semantics.
>>>>>
>>>>>
>>>>>> > Rationale for the above points:
>>>>>> >
>>>>>> > 1. Adding the autolinked inputs last makes the process simple to
>>>>>> understand from a developers perspective. All linkers are able to implement
>>>>>> this scheme.
>>>>>> > 2. Error-ing for libraries that are not found seems like better
>>>>>> behavior than failing the link during symbol resolution.
>>>>>> > 3. It seems useful for the user to be able to apply command line
>>>>>> options which will affect all of the autolinked input files. There is a
>>>>>> potential problem of surprise for developers, who might not realize that
>>>>>> these options would apply to the "invisible" autolinked input files;
>>>>>> however, despite the potential for surprise, this is easy for developers to
>>>>>> reason about and gives developers the control that they may require.
>>>>>> > 4. Unlike on the command line it is probably easy to include the
>>>>>> same input file twice via pragmas and might be a pain to fix; think of
>>>>>> Third-party libraries supplied as binaries.
>>>>>> > 5. This algorithm takes into account all of the different ways that
>>>>>> ELF linkers find input files. The different search methods are tried by the
>>>>>> linker in most obvious to least obvious order.
>>>>>> > 6. I considered adding finer grained control over which .autolink
>>>>>> inputs were ignored (e.g. MSVC has /nodefaultlib:<library>); however, I
>>>>>> concluded that this is not necessary: if finer control is required
>>>>>> developers can recreate the same effect autolinking would have had using
>>>>>> command line options.
>>>>>> >
>>>>>> > Thoughts?
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > LLVM Developers mailing list
>>>>>> > llvm-dev at lists.llvm.org
>>>>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190319/701372e7/attachment-0001.html>


More information about the llvm-dev mailing list