[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

Wed May 4 08:07:47 PDT 2016

On Tue, May 3, 2016 at 10:25 PM, Peter Collingbourne via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On Tue, May 3, 2016 at 10:04 PM, Mehdi Amini <mehdi.amini at apple.com>
> wrote:
>
>>
>> On May 3, 2016, at 10:01 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
>>
>>
>>
>> On Tue, May 3, 2016 at 9:01 PM, Mehdi Amini <mehdi.amini at apple.com>
>> wrote:
>>
>>>
>>> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
>>>
>>> Hi all,
>>>
>>> I'd like to propose changes to how we do promotion of global values in
>>> ThinLTO. The goal here is to make it possible to pre-compile parts of the
>>> translation unit to native code at compile time. For example, if we know
>>> that:
>>>
>>> 1) A function is a leaf function, so it will never import any other
>>> functions, and
>>> 2) The function's instruction count falls above a threshold specified at
>>> compile time, so it will never be imported.
>>> or
>>> 3) The compile-time threshold is zero, so there is no possibility of
>>> functions being imported (What's the utility of this? Consider a program
>>> transformation that requires whole-program information, such as CFI. During
>>> development, the import threshold may be set to zero in order to minimize
>>> the incremental link time while still providing the same CFI enforcement
>>> that would be used in production builds of the application.)
>>>
>>> then the function's body will not be affected by link-time decisions,
>>> and we might as well produce its object code at compile time. This will
>>> also allow the object code to be shared between linkage units (this should
>>> hopefully help solve a major scalability problem for Chromium, as that
>>> project contains a large number of test binaries based on common libraries).
>>>
>>> This can be done with a change to the intermediate object file format.
>>> We can represent object files as native code containing statically compiled
>>> functions and global data in the .text,. data, .rodata (etc.) sections,
>>> with an .llvmbc section (or, I suppose, "__LLVM, __bitcode" when targeting
>>> Mach-O) containing bitcode for functions to be compiled at link time.
>>>
>>>
>>> I was wondering why can't the "precompiled" function be embedded in the
>>> IR instead of the bitcode embedded in the object file?
>>> The codegen would still emit a single object file out of this IR file
>>> that contains the code for the IR and the precompiled function.
>>>
>>> It seems to me that this way the scheme would work with any existing
>>> existing LTO implementation.
>>>
>>
>> You'd still have the same problem. No matter whether you put the native
>> object inside the IR file or vice versa, you still have a file containing a
>> native object and some IR. That's the scenario that I found that the gold
>> plugin interface wouldn't support.
>>
>>
>> It is not clear to me why it is a problem for gold: it does not need to
>> know that the IR file contains some native precompiled code: it only need
>> to know that this is an "LLVM file", that will be passed to LLVM for LTO
>> and it will get a single object file in return.
>> Can you elaborate why the linker need to know beforehand and
>> differentiate?
>>
>
> (There wouldn't just be one object file, there would be N native objects
> and 1 (or N if ThinLTO) combined LTO objects.)
>

If the system can already handle the N case, for ThinLTO - that seems like
it would solve the problem here, right? (LTO, when asked by the linker to
produce the N object files would just build the N/2 object files from IR,
and build another N/2 object files that were already object files (by
spitting out the embedded object code from the IR into a new file without
touching any of the bits)). But perhaps I'm not understanding something.

I think that's what Mehdi means by not having to modify existing linkers -
it seems anything that can cope with ThinLTO could cope with a few more
files being created, no? (I don't know too much about this stuff, though)

>
> In principle, it doesn't need to know. In practice, I found that in my
> prototype I couldn't persuade gold to accept what I was doing without
> giving undefined symbol errors.
>
> I suppose I could have debugged it further, but I couldn't justify
> spending more time on it, since the projects I care about are interested in
> switching to lld for other reasons.
>
> Peter
>
>
>
>>
>> --
>> Mehdi
>>
>>
>> Supporting IR embedded in a native object section inside a linker should
>> be pretty trivial, if you control the linker. My prototype implementation
>> in lld is about 10 lines of code.
>>
>> Peter
>>
>>
>>> --
>>> Mehdi
>>>
>>>
>>>
>>>
>>> In order to make this work, we need to make sure that references from
>>> link-time compiled functions to statically compiled functions work
>>> correctly in the case where the statically compiled function has internal
>>> linkage. We can do this by promoting every global value with internal
>>> linkage, using a hash of the external names (as I mentioned in [1]).
>>>
>>> I imagine that for some linkers, it may not be possible to deal with
>>> this scheme. For example, I did some investigation last year and discovered
>>> that I could not use the gold plugin interface to load a native object file
>>> if we had already claimed it as an IR file. I wouldn't be surprised to
>>> learn that ld64 has similar problems.
>>>
>>> In cases where we completely control the linker (e.g. lld), we can
>>> easily support this scheme, as the linker can directly do whatever it
>>> wants. But for linkers that cannot support this, I suggest that we promote
>>> consistently under ThinLTO rather than having different promotion schemes
>>> for different linkers, in order to reduce overall complexity.
>>>
>>> Thanks for your feedback!
>>>
>>> Thanks,
>>> --
>>> --
>>> Peter
>>>
>>> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098062.html
>>>
>>>
>>>
>>
>>
>> --
>> --
>> Peter
>>
>>
>>
>
>
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160504/6186222f/attachment.html>