[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

Thu Apr 7 10:58:09 PDT 2016

On Wed, Apr 6, 2016 at 9:53 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:

>
> On Apr 6, 2016, at 9:40 PM, Teresa Johnson <tejohnson at google.com> wrote:
>
>
>
> On Wed, Apr 6, 2016 at 5:13 PM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
>
>>
>>
>> On Wed, Apr 6, 2016 at 4:53 PM, Mehdi Amini <mehdi.amini at apple.com>
>> wrote:
>>
>>>
>>> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
>>>
>>> Hi all,
>>>
>>> I'd like to propose changes to how we do promotion of global values in
>>> ThinLTO. The goal here is to make it possible to pre-compile parts of the
>>> translation unit to native code at compile time. For example, if we know
>>> that:
>>>
>>> 1) A function is a leaf function, so it will never import any other
>>> functions, and
>>>
>>>
>>> It still may be imported somewhere else right?
>>>
>>> 2) The function's instruction count falls above a threshold specified at
>>> compile time, so it will never be imported.
>>>
>>>
>>> It won’t be imported, but unless it is a “leaf” it may import and inline
>>> itself.
>>>
>>
>>> or
>>> 3) The compile-time threshold is zero, so there is no possibility of
>>> functions being imported (What's the utility of this? Consider a program
>>> transformation that requires whole-program information, such as CFI. During
>>> development, the import threshold may be set to zero in order to minimize
>>> the incremental link time while still providing the same CFI enforcement
>>> that would be used in production builds of the application.)
>>>
>>> then the function's body will not be affected by link-time decisions,
>>> and we might as well produce its object code at compile time.
>>>
>>>
>>> Reading this last sentence, it seems exactly the “non-LTO” case?
>>>
>>
>> Yes, basically the point of this proposal is to be able to split the
>> linkage unit into LTO and non-LTO parts.
>>
>>
>>> This will also allow the object code to be shared between linkage units
>>> (this should hopefully help solve a major scalability problem for Chromium,
>>> as that project contains a large number of test binaries based on common
>>> libraries).
>>>
>>> This can be done with a change to the intermediate object file format.
>>> We can represent object files as native code containing statically compiled
>>> functions and global data in the .text,. data, .rodata (etc.) sections,
>>> with an .llvmbc section (or, I suppose, "__LLVM, __bitcode" when targeting
>>> Mach-O) containing bitcode for functions to be compiled at link time.
>>>
>>> In order to make this work, we need to make sure that references from
>>> link-time compiled functions to statically compiled functions work
>>> correctly in the case where the statically compiled function has internal
>>> linkage. We can do this by promoting every global value with internal
>>> linkage, using a hash of the external names (as I mentioned in [1]).
>>>
>>>
> Mehdi - I know you were keen to reduce the amount of promotion. Is that
> still an issue for you assuming linker GC (dead stripping)?
>
>
> Yes: we do better optimization on internal function in general.
>

Inliner is one of the affected optimization -- however this sounds like a
matter of tuning to teach inliner about promoted static functions.

David

> Our benchmarks showed that it can really make some difference, and many
> cases were ThinLTO didn’t perform as well as FullLTO were because of this
> promotion.
> (binary size has never been my concern here)
>

>
> With this proposal we will need to stick with the current promote
> everything scheme.
>
>
> I don’t think so: you would need do it only for “internal functions that a
> leaf and aren’t likely to be imported/inlined”.
> That said any function that we emit the binary at compile time instead of
> link time will contribute to inhibit optimizations for LTO/ThinLTO. The
> gain in compile time has to be really important to make it worth it.
> (Of course the CFI use-case is a totally different tradeoff).
>
> Peter: have you thought about debug info by the way?
>
> —
> Mehdi
>
>
>
>
>
>>
>>> I imagine that for some linkers, it may not be possible to deal with
>>> this scheme. For example, I did some investigation last year and discovered
>>> that I could not use the gold plugin interface to load a native object file
>>> if we had already claimed it as an IR file. I wouldn't be surprised to
>>> learn that ld64 has similar problems.
>>>
>>>
>>> I suspect ld64 would need to be update to handle this scheme. Somehow it
>>> need to be able to extract the object from the section.
>>>
>>
>> Do you mean the bitcode object? There's already support in LLVM for this (
>> http://llvm-cs.pcc.me.uk/lib/Object/IRObjectFile.cpp#269). I suspect the
>> tricky part (which I was unsuccessful at doing with gold) will be
>> convincing the linker that the bitcode object and the regular object are
>> two separate things.
>>
>> Otherwise this should work, but I suspect the applicability (leaving CFI
>>> aside) may not concern that many functions, so I’m not sure about the
>>> impact.
>>>
>>
>> I'm also curious about the applicability in the regular ThinLTO case. One
>> thing that may help with applicability is that I suspect that not only leaf
>> functions but also functions that only call (directly or indirectly)
>> functions in the same TU, as well as functions that only make calls via
>> function pointers or vtables may fall under the criteria for non-importing.
>>
>
> With indirect call profiling the function pointer case should not be a
> blocker for importing.
>
> Teresa
>
>
>>
>> Peter
>>
>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160407/24dfa3eb/attachment-0001.html>