[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units
Peter Collingbourne via llvm-dev
llvm-dev at lists.llvm.org
Thu Apr 7 13:21:22 PDT 2016
On Thu, Apr 7, 2016 at 12:52 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
> On Apr 7, 2016, at 12:39 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
>
>
>
> On Thu, Apr 7, 2016 at 12:29 PM, Mehdi Amini <mehdi.amini at apple.com>
> wrote:
>
>>
>> On Apr 7, 2016, at 11:59 AM, Xinliang David Li <davidxl at google.com>
>> wrote:
>>
>>
>>
>> On Thu, Apr 7, 2016 at 11:26 AM, Mehdi Amini <mehdi.amini at apple.com>
>> wrote:
>>
>>>
>>> On Apr 7, 2016, at 10:58 AM, Xinliang David Li <davidxl at google.com>
>>> wrote:
>>>
>>>
>>>
>>> On Wed, Apr 6, 2016 at 9:53 PM, Mehdi Amini <mehdi.amini at apple.com>
>>> wrote:
>>>
>>>>
>>>> On Apr 6, 2016, at 9:40 PM, Teresa Johnson <tejohnson at google.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Apr 6, 2016 at 5:13 PM, Peter Collingbourne <peter at pcc.me.uk>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 6, 2016 at 4:53 PM, Mehdi Amini <mehdi.amini at apple.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at pcc.me.uk>
>>>>>> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'd like to propose changes to how we do promotion of global values
>>>>>> in ThinLTO. The goal here is to make it possible to pre-compile parts of
>>>>>> the translation unit to native code at compile time. For example, if we
>>>>>> know that:
>>>>>>
>>>>>> 1) A function is a leaf function, so it will never import any other
>>>>>> functions, and
>>>>>>
>>>>>>
>>>>>> It still may be imported somewhere else right?
>>>>>>
>>>>>> 2) The function's instruction count falls above a threshold specified
>>>>>> at compile time, so it will never be imported.
>>>>>>
>>>>>>
>>>>>> It won’t be imported, but unless it is a “leaf” it may import and
>>>>>> inline itself.
>>>>>>
>>>>>
>>>>>> or
>>>>>> 3) The compile-time threshold is zero, so there is no possibility of
>>>>>> functions being imported (What's the utility of this? Consider a program
>>>>>> transformation that requires whole-program information, such as CFI. During
>>>>>> development, the import threshold may be set to zero in order to minimize
>>>>>> the incremental link time while still providing the same CFI enforcement
>>>>>> that would be used in production builds of the application.)
>>>>>>
>>>>>> then the function's body will not be affected by link-time decisions,
>>>>>> and we might as well produce its object code at compile time.
>>>>>>
>>>>>>
>>>>>> Reading this last sentence, it seems exactly the “non-LTO” case?
>>>>>>
>>>>>
>>>>> Yes, basically the point of this proposal is to be able to split the
>>>>> linkage unit into LTO and non-LTO parts.
>>>>>
>>>>>
>>>>>> This will also allow the object code to be shared between linkage
>>>>>> units (this should hopefully help solve a major scalability problem for
>>>>>> Chromium, as that project contains a large number of test binaries based on
>>>>>> common libraries).
>>>>>>
>>>>>> This can be done with a change to the intermediate object file
>>>>>> format. We can represent object files as native code containing statically
>>>>>> compiled functions and global data in the .text,. data, .rodata (etc.)
>>>>>> sections, with an .llvmbc section (or, I suppose, "__LLVM, __bitcode" when
>>>>>> targeting Mach-O) containing bitcode for functions to be compiled at link
>>>>>> time.
>>>>>>
>>>>>> In order to make this work, we need to make sure that references from
>>>>>> link-time compiled functions to statically compiled functions work
>>>>>> correctly in the case where the statically compiled function has internal
>>>>>> linkage. We can do this by promoting every global value with internal
>>>>>> linkage, using a hash of the external names (as I mentioned in [1]).
>>>>>>
>>>>>>
>>>> Mehdi - I know you were keen to reduce the amount of promotion. Is that
>>>> still an issue for you assuming linker GC (dead stripping)?
>>>>
>>>>
>>>> Yes: we do better optimization on internal function in general.
>>>>
>>>
>>> Inliner is one of the affected optimization -- however this sounds like
>>> a matter of tuning to teach inliner about promoted static functions.
>>>
>>>
>>> The inliner compute a tradeoff between pseudo runtime cost and binary
>>> size, the existing bonus for static functions is when there is a single
>>> call site because it makes the binary increase inexistant (dropping the
>>> static after inline). We promote function because we think we are likely to
>>> introduce a reference to it somewhere else, so “lying” to the inliner is
>>> not necessarily a good idea.
>>>
>>
>> It is not lying to the inliner. If a static (before promotion) function
>> is a candidate to be inlined in the original defining module, it is
>> probably more likely to inlined in other importing modules where more
>> context is available. In other words, the inliner can apply the same bonus
>> to 'promoted' static functions as if references in other modules will also
>> disappear. Of course, we can not assume it has single callsite.
>>
>> Comdat functions can be handled similarly.
>>
>>
>>
>>> That said we (actually Bruno did) prototyped it already with somehow
>>> good results :)
>>> I’m not convinced yet that it should be independent of promoted or not
>>> promoted though.
>>>
>>
>> Generally true (see the comdat case).
>>
>>
>>>
>>> Assuming we solve the inliner issue, then remain the “optimizations
>>> other than inliner”. We can probably solve most but I suspect it won’t be
>>> “trivial” either.
>>>
>>
>>
>> Any such optimizations in mind?
>>
>>
>> I don’t have the details, but in short:
>>
>> For promoted functions: IPSCCP, dead arg elimination
>> For promoted global variables: anything that is impacted somehow by
>> aliasing
>>
>
> When are you imagining that promotion would happen? If it happens just
> before codegen (or bitcode emission), it wouldn't inhibit these
> optimizations, right?
>
>
> For ThinLTO it has to happen before the link-time optimizations, because
> of cross-module importing.
>
Are you referring to the fact that these optimizations would be inhibited
versus regular LTO, since we cannot internalize? Yes, that does seem like
an issue.
--
--
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160407/72ccb18a/attachment-0001.html>
More information about the llvm-dev
mailing list