[llvm-dev] RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units

Thu Apr 7 12:39:34 PDT 2016

On Thu, Apr 7, 2016 at 12:29 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:

>
> On Apr 7, 2016, at 11:59 AM, Xinliang David Li <davidxl at google.com> wrote:
>
>
>
> On Thu, Apr 7, 2016 at 11:26 AM, Mehdi Amini <mehdi.amini at apple.com>
> wrote:
>
>>
>> On Apr 7, 2016, at 10:58 AM, Xinliang David Li <davidxl at google.com>
>> wrote:
>>
>>
>>
>> On Wed, Apr 6, 2016 at 9:53 PM, Mehdi Amini <mehdi.amini at apple.com>
>> wrote:
>>
>>>
>>> On Apr 6, 2016, at 9:40 PM, Teresa Johnson <tejohnson at google.com> wrote:
>>>
>>>
>>>
>>> On Wed, Apr 6, 2016 at 5:13 PM, Peter Collingbourne <peter at pcc.me.uk>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Apr 6, 2016 at 4:53 PM, Mehdi Amini <mehdi.amini at apple.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On Apr 6, 2016, at 4:41 PM, Peter Collingbourne <peter at pcc.me.uk>
>>>>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd like to propose changes to how we do promotion of global values in
>>>>> ThinLTO. The goal here is to make it possible to pre-compile parts of the
>>>>> translation unit to native code at compile time. For example, if we know
>>>>> that:
>>>>>
>>>>> 1) A function is a leaf function, so it will never import any other
>>>>> functions, and
>>>>>
>>>>>
>>>>> It still may be imported somewhere else right?
>>>>>
>>>>> 2) The function's instruction count falls above a threshold specified
>>>>> at compile time, so it will never be imported.
>>>>>
>>>>>
>>>>> It won’t be imported, but unless it is a “leaf” it may import and
>>>>> inline itself.
>>>>>
>>>>
>>>>> or
>>>>> 3) The compile-time threshold is zero, so there is no possibility of
>>>>> functions being imported (What's the utility of this? Consider a program
>>>>> transformation that requires whole-program information, such as CFI. During
>>>>> development, the import threshold may be set to zero in order to minimize
>>>>> the incremental link time while still providing the same CFI enforcement
>>>>> that would be used in production builds of the application.)
>>>>>
>>>>> then the function's body will not be affected by link-time decisions,
>>>>> and we might as well produce its object code at compile time.
>>>>>
>>>>>
>>>>> Reading this last sentence, it seems exactly the “non-LTO” case?
>>>>>
>>>>
>>>> Yes, basically the point of this proposal is to be able to split the
>>>> linkage unit into LTO and non-LTO parts.
>>>>
>>>>
>>>>> This will also allow the object code to be shared between linkage
>>>>> units (this should hopefully help solve a major scalability problem for
>>>>> Chromium, as that project contains a large number of test binaries based on
>>>>> common libraries).
>>>>>
>>>>> This can be done with a change to the intermediate object file format.
>>>>> We can represent object files as native code containing statically compiled
>>>>> functions and global data in the .text,. data, .rodata (etc.) sections,
>>>>> with an .llvmbc section (or, I suppose, "__LLVM, __bitcode" when targeting
>>>>> Mach-O) containing bitcode for functions to be compiled at link time.
>>>>>
>>>>> In order to make this work, we need to make sure that references from
>>>>> link-time compiled functions to statically compiled functions work
>>>>> correctly in the case where the statically compiled function has internal
>>>>> linkage. We can do this by promoting every global value with internal
>>>>> linkage, using a hash of the external names (as I mentioned in [1]).
>>>>>
>>>>>
>>> Mehdi - I know you were keen to reduce the amount of promotion. Is that
>>> still an issue for you assuming linker GC (dead stripping)?
>>>
>>>
>>> Yes: we do better optimization on internal function in general.
>>>
>>
>> Inliner is one of the affected optimization -- however this sounds like a
>> matter of tuning to teach inliner about promoted static functions.
>>
>>
>> The inliner compute a tradeoff between pseudo runtime cost and binary
>> size, the existing bonus for static functions is when there is a single
>> call site because it makes the binary increase inexistant (dropping the
>> static after inline). We promote function because we think we are likely to
>> introduce a reference to it somewhere else, so “lying” to the inliner is
>> not necessarily a good idea.
>>
>
> It is not lying to the inliner. If a static (before promotion) function is
> a candidate to be inlined in the original defining module, it is probably
> more likely to inlined in other importing modules where more context is
> available. In other words, the inliner can apply the same bonus to
> 'promoted' static functions as if references in other modules will also
> disappear.  Of course, we can not assume it has single callsite.
>
> Comdat functions can be handled similarly.
>
>
>
>> That said we (actually Bruno did) prototyped it already with somehow good
>> results :)
>> I’m not convinced yet that it should be independent of promoted or not
>> promoted though.
>>
>
> Generally true (see the comdat case).
>
>
>>
>> Assuming we solve the inliner issue, then remain the “optimizations other
>> than inliner”. We can probably solve most but I suspect it won’t be
>> “trivial” either.
>>
>
>
> Any such optimizations in mind?
>
>
> I don’t have the details, but in short:
>
> For promoted functions: IPSCCP, dead arg elimination
> For promoted global variables: anything that is impacted somehow by
> aliasing
>

When are you imagining that promotion would happen? If it happens just
before codegen (or bitcode emission), it wouldn't inhibit these
optimizations, right?

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160407/a6229b0f/attachment.html>