[llvm-dev] [RFC] Enhance Partial Inliner by using a general outlining scheme for cold blocks

Tue Aug 29 09:15:19 PDT 2017

I second the fact that a way to outline specific function regions
independently of the partial inliner sound very useful. I am not sure
however if we would want a mode within the partialInliner or something
completely independent.

As a general question,   does anybody has a clear idea of what are the
constraints on the region CodeExtractor is currently able to handle ?
Going through the code, it looks like the only requirement is for the
header to dominate all the BB in the region ;

On Sat, Aug 26, 2017 at 9:52 AM, Xinliang David Li via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
>
> On Thu, Aug 24, 2017 at 12:47 PM, Graham Yiu <gyiu at ca.ibm.com> wrote:
>
>> Hi David,
>>
>> The only reason I can see to use the 'pattern matching' part as a
>> fall-back is in case we cannot inline the (what I'm assuming would be) a
>> much bigger hot-path-only cloned function for whatever reason. What I'm
>> assuming here is that after cold-region outlining, we may still have a
>> large portion of the original function body to attempt to inline, whereas
>> the pattern matching method will only contain a few basic blocks, giving a
>> better chance to inline something.
>>
>
> With profile data, the overhead of outlining a cold region can be
> estimated more accurately. (With the new PM), the threshold of inlining a
> hot callsite is also much higher. Without profile, the pattern matching
> method won't work too well in general even though it can enable more more
> inlining because the call overhead introduced to call the outlined function
> may outweigh the benefit of inlining the caller.
>
> What ever region that can be found by the pattern matching method should
> be identified by the new method as well. If there are multiple (but
> mutually exclusive) candidate regions found, the cost analysis heuristic
> should pick the best candidate region for outlining .
>
>
>>
>>
>> For your (2) point, I think we'll have to be careful here. Without a
>> sense of how 'likely' we're going to inline the new function, we'll have to
>> make sure our outlining of cold regions will not degrade the performance of
>> the function in 99.xx% of the cases, as it's unclear how much performance
>> we'll gain from just outlining (without inlining to increase the odds of
>> some performance gain). My initial thought was to ditch the new function
>> and its outlined children if we cannot immediately inline it.
>>
>
> The outlining only mode is useful to enable more aggressive inlining for
> the regular inlining pass. Slightly different heuristics can be applied
> here. For instance it can prefer largest candidate region (to maximiize the
> chance to inline the caller). The outlined region does not need to be super
> cold and leave it to the inliner to do more deeper analysis and decide to
> inline it right back in.
>
> David
>
>
>
>>
>>
>> Graham Yiu
>> LLVM Compiler Development
>> IBM Toronto Software Lab
>> Office: (905) 413-4077 C2-707/8200/Markham
>> Email: gyiu at ca.ibm.com
>>
>> [image: Inactive hide details for Xinliang David Li ---08/24/2017
>> 03:05:06 PM---On Thu, Aug 24, 2017 at 10:40 AM, Graham Yiu <gyiu at ca.i]Xinliang
>> David Li ---08/24/2017 03:05:06 PM---On Thu, Aug 24, 2017 at 10:40 AM,
>> Graham Yiu <gyiu at ca.ibm.com> wrote: > Hi David,
>>
>> From: Xinliang David Li <xinliangli at gmail.com>
>> To: Graham Yiu <gyiu at ca.ibm.com>
>> Cc: llvm-dev <llvm-dev at lists.llvm.org>
>> Date: 08/24/2017 03:05 PM
>>
>> Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a general
>> outlining scheme for cold blocks
>> ------------------------------
>>
>>
>>
>>
>>
>> On Thu, Aug 24, 2017 at 10:40 AM, Graham Yiu <*gyiu at ca.ibm.com*
>> <gyiu at ca.ibm.com>> wrote:
>>
>>    Hi David,
>>
>>    So I've began doing some implementation on the outlining portion of
>>    the code. Currently, I got the partial inliner to outline cold regions
>>    (single entry, single exit) of the code, based solely on the existence of
>>    ProfileSummaryInfo (ie. profiling data). However, I have some concerns on
>>    how this will co-exist with the existing code that peels early returns.
>>
>>    The control flow looks something like this:
>>
>>    // New Code: find cold regions to outline
>>    if (!computeOutliningInfoForColdRegions()) {
>>    // If we can't find any cold regions, then fall-back to early return
>>    peeling
>>    if (!computeOutliningInfo) {
>>    return nullptr;
>>    }
>>    }
>>    // Try to outline the identified regions
>>    // Then try to inline the cloned function
>>
>>    My concern is during inlining, if we fail to inline the cloned
>>    function, we give up and discard all cloned and outlined functions. But
>>    with these two types of outlining we're doing, it's possible to attempt to
>>    inline the cloned function that has outlined cold regions, and if we cannot
>>    do so, try to inline a different clone that has peeled early returns (ie.
>>    the way we have it today). This would require us to clone the original
>>    function twice and modify one based on cold region outlining and the other
>>    early return peeling, with the latter being our fall-back option if we fail
>>    to inline the first clone.
>>
>>    What are your thoughts?
>>
>>
>>
>> I expect  computeOutliningInfoForColdRegions can produce a super set of
>> outlinable regions to the current 'pattern matching' approach. In other
>> words, most of the cases currently caught by 'computeOutlineInfo' should be
>> caught by the new algorithm, so why not ditching the current
>> 'computeOutlningInfo' completely?
>>
>> My suggestion was to enhance the pass to 1) support outlining multiple
>> regions; and 2) add a mode to do function outlining only (not the inlining
>> part).  The second is important can be used before the regular inliner
>> pass.   With the new pass manager and profile aware inlining, the inliner
>> won't undo the outline decision, but in meantime becomes more powerful due
>> to the reduced hot function size.
>>
>> David
>>
>>
>>
>>    Graham Yiu
>>    LLVM Compiler Development
>>    IBM Toronto Software Lab
>>    Office: *(905) 413-4077* <(905)%20413-4077> C2-707/8200/Markham
>>    Email: *gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>>
>>    [image: Inactive hide details for Graham Yiu---08/15/2017 08:04:28
>>    PM---Hey David, Yes, we'll need to consider the effect on live range]Graham
>>    Yiu---08/15/2017 08:04:28 PM---Hey David, Yes, we'll need to consider the
>>    effect on live ranges for regions we want to outline. In
>>
>>    From: Graham Yiu/Toronto/IBM
>>    To: Xinliang David Li <*xinliangli at gmail.com* <xinliangli at gmail.com>>
>>    Cc: llvm-dev <*llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>>
>>    Date: 08/15/2017 08:04 PM
>>    Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a
>>    general outlining scheme for cold blocks
>>    ------------------------------
>>
>>
>>    Hey David,
>>
>>    Yes, we'll need to consider the effect on live ranges for regions we
>>    want to outline. In my experience, outlining live-exit regions seem to
>>    cause the most harm as we ruin chances to keep data in registers as you
>>    were alluding to. It's unclear, however, what the exact effect of outlining
>>    regions with live-entries would be.
>>
>>    I'll probably try to avoid regions that are not single entry & single
>>    exit at least initially, to simplify the transformation and analysis. Are
>>    multi-exit regions common in your experience?
>>
>>    And of course, I agree, we should reuse as much of the current
>>    partial inlining infrastructure as possible. I'll likely run some ideas by
>>    you as I begin to make changes.
>>
>>    Cheers,
>>
>>    Graham Yiu
>>    LLVM Compiler Development
>>    IBM Toronto Software Lab
>>    Office: *(905) 413-4077* <(905)%20413-4077> C2-407/8200/Markham
>>    Email: *gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>>
>>
>>    [image: Inactive hide details for Xinliang David Li ---08/15/2017
>>    05:36:07 PM---Hi Graham, Making partial inlining more general is some]Xinliang
>>    David Li ---08/15/2017 05:36:07 PM---Hi Graham, Making partial inlining
>>    more general is something worth doing. Regarding your implementat
>>
>>    From: Xinliang David Li <*xinliangli at gmail.com* <xinliangli at gmail.com>
>>    >
>>    To: Graham Yiu <*gyiu at ca.ibm.com* <gyiu at ca.ibm.com>>
>>    Cc: llvm-dev <*llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>>
>>    Date: 08/15/2017 05:36 PM
>>    Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a
>>    general outlining scheme for cold blocks
>>    ------------------------------
>>
>>
>>
>>
>>    Hi Graham, Making partial inlining more general is something worth
>>    doing.  Regarding your implementation plan, I have some suggestions here:
>>
>>    *) Function outlining introduces additional runtime cost: passing of
>>    live in values, returning of live out values (via memory), glue code in the
>>    caller to handle regions without a single exit block etc.  The cost
>>    analysis needs to factor in those carefully
>>    *) Remove the limitation that there is only *one* outlined routine.
>>    Instead, the algorithm can compute multiple single-entry/single exit or
>>    single entry/multiple exit regions (cold ones) in the routine, and outline
>>    each region into its own function. The benefit include
>>       1) simplify the design and implementation and most of the existing
>>    code can be reused;
>>       2) provide more flexibility to allow most effective outlining;
>>       3) reduced runtime overhead of making calls to the outline
>>    functions.
>>
>>    thanks,
>>
>>    David
>>
>>    On Tue, Aug 15, 2017 at 11:22 AM, Graham Yiu via llvm-dev <
>>    *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>> wrote:
>>       Hello,
>>
>>          My team and I are looking to do some enhancements in the
>>          partial inliner in opt. Would appreciate any feedback that folks might have.
>>
>>          # Partial Inlining in LLVM opt
>>
>>          ## Summary
>>
>>          ### Background
>>
>>          Currently, the partial inliner searches the first few blocks of
>>          the callee and looks for a branch to the return block (ie. early return).
>>          If found, it attempts to outline the rest of the slow (or heavy) code so
>>          the inliner will be able to inline the fast (or light) code. If no early
>>          returns are found, the partial inliner will give up. As far as I can tell,
>>          BlockFrequency and BranchProbability information is only used when
>>          attempting to inline the early return code, and not used to determine
>>          whether to outline the slow code.
>>
>>          ### Proposed changes
>>
>>          In addition to looking for early returns, we should utilize
>>          profile information to outline blocks that are considered cold. If we can
>>          sufficiently reduce the size of the original function via this type of
>>          outlining, inlining should be able to inline the rest of the hot code.
>>
>>          ## Details
>>
>>          With the presence of profile information, we have a view of
>>          what code is infrequently executed and make better decisions on what to
>>          outline. Early return blocks that are infrequently executed should still be
>>          included as candidates for outlining, but will be treated just like any
>>          other cold blocks. Without profiling information, however, we should remain
>>          conservative and only partial inline in the presence of an early return in
>>          the first few blocks of a function (ie. peel the early return out of the
>>          function).
>>
>>          To find cold regions to outline, we will traverse the CFG to
>>          find edges deemed 'cold' and look at the blocks dominated by the successor
>>          node. If, for some reason, that block has more than one predecessor, then
>>          we will skip this candidate as there should be a node that dominates this
>>          successor that has a single entry point. The last node in the dominance
>>          vector should also have a single successor. If it does not, then further
>>          investigation of the CFG is necessary to see when/how this situation occurs.
>>
>>          We will need several heuristics to make sure we only outline in
>>          cases where we are confident it will result in a performance gain. Things
>>          such as threshold on when a branch is considered cold, the minimum number
>>          of times the predecessor node has to be executed in order for an edge to be
>>          considered (confidence factor), and the minimum size of the region to be
>>          outlined (can use inlining cost analysis like we currently do) will require
>>          some level of tuning.
>>
>>          Similar to the current implementation, we will attempt to
>>          inline the leftover (hot) parts of the code, and if for some reason we
>>          cannot then we discard the modified function and its outlined code.
>>
>>          ### Code changes
>>
>>          The current Partial Inlining code first clones the function of
>>          interest and looks for a single set of blocks to outline. It then creates a
>>          function with the set the blocks, and saves the outlined function and
>>          outline callsite information as part of the function cloning container. In
>>          order to outline multiple regions of the function, we will need to change
>>          these containers to keep track of a list of regions to outline. We will
>>          also need to update the cost analysis to take into account multiple
>>          outlined functions.
>>
>>          When a ProfileSummary is available, then we should skip the
>>          code that looks for early returns and go into new code that looks for cold
>>          regions to outline. When ProfileSummary is not available, then we can fall
>>          back to the existing code and look for early returns only.
>>
>>          ### Tuning
>>
>>          - The outlining heuristics will need to determine if a set of
>>          cold blocks is large enough to warrant the overhead of a function call. We
>>          also don't want the inliner to attempt to inline the outlined code later.
>>          - The threshold for determining whether a block is cold will
>>          also need to be tuned. In the case that profiling information is not
>>          accurate, we will pay the price of the additional call overhead for
>>          executing cold code.
>>          - The confidence factor, which can be viewed as the minimum
>>          number of times the predecessor has to be executed in order for an edge to
>>          be considered cold, should also be taken into account to avoid outlining
>>          code paths we have little information on.
>>
>>          Graham Yiu
>>          LLVM Compiler Development
>>          IBM Toronto Software Lab
>>          Office: *(905) 413-4077* <(905)%20413-4077> C2-407/8200/Markham
>>          Email: *gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>>
>>          _______________________________________________
>>          LLVM Developers mailing list
>> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>
>> *http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev*
>>          <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=4ST7e3kMd0GTi3w9ByK5Cw&m=rbfPPnRP9weVvtwCT5LyhMrn3TeP6-HaVUUkv-DHQ5I&s=0NPYoALj0vvVlLnq4AKtctnM_tHFxPY6SsX5mv2LMUE&e=>
>>
>>
>>
>>
>>
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170829/8e2c4473/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170829/8e2c4473/attachment.gif>