[llvm-dev] [RFC] Enhance Partial Inliner by using a general outlining scheme for cold blocks
River Riddle via llvm-dev
llvm-dev at lists.llvm.org
Tue Aug 29 10:52:44 PDT 2017
On Tue, Aug 29, 2017 at 10:24 AM, Graham Yiu via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi Kader,
>
> I agree with you, if we were going to only do outlining for some functions
> and not immediately attempt to inline, it should be an independent pass.
> The partial inliner should do what its name suggests and attempt to inline
> or bail.
>
> I haven't looked through the CodeExtractor at all. I imagine I'll have to
> go through it at some point. I'd also be interested in something that does
> an analysis before code extraction that tells me how many live ranges I'm
> going to be killing or how many symbols I'm going to be taking the address
> of by extracting a specific region of code. Not sure if that currently
> exists.
>
CodeExtractor uses CodeExtractor::findInputsOutputs to determine the
inputs/outputs into the outlined region. It's a public utility and should
be roughly what you are looking for. It is used in conjunction with
CodeExtractor::findAllocas as it will try to hoist allocas that are only
used within the outlined region. Just look at
CodeExtractor::extractCodeRegion to see how they interact together. You
don't have to use them together, but you will lose information on some
unnecessary inputs.
Thanks,
River Riddle
>
>
> Graham Yiu
> LLVM Compiler Development
> IBM Toronto Software Lab
> Office: (905) 413-4077 C2-707/8200/Markham
> Email: gyiu at ca.ibm.com
>
> [image: Inactive hide details for keita abdoul-kader ---08/29/2017
> 12:15:25 PM---I second the fact that a way to outline specific funct]keita
> abdoul-kader ---08/29/2017 12:15:25 PM---I second the fact that a way to
> outline specific function regions independently of the partial inlin
>
> From: keita abdoul-kader <abdoulk.keita at gmail.com>
> To: Xinliang David Li <xinliangli at gmail.com>
> Cc: Graham Yiu <gyiu at ca.ibm.com>, llvm-dev <llvm-dev at lists.llvm.org>
> Date: 08/29/2017 12:15 PM
>
> Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a general
> outlining scheme for cold blocks
> ------------------------------
>
>
>
> I second the fact that a way to outline specific function regions
> independently of the partial inliner sound very useful. I am not sure
> however if we would want a mode within the partialInliner or something
> completely independent.
>
> As a general question, does anybody has a clear idea of what are the
> constraints on the region CodeExtractor is currently able to handle ?
> Going through the code, it looks like the only requirement is for the
> header to dominate all the BB in the region ;
>
> On Sat, Aug 26, 2017 at 9:52 AM, Xinliang David Li via llvm-dev <
> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>> wrote:
>
>
>
> On Thu, Aug 24, 2017 at 12:47 PM, Graham Yiu <*gyiu at ca.ibm.com*
> <gyiu at ca.ibm.com>> wrote:
> Hi David,
>
> The only reason I can see to use the 'pattern matching' part as a
> fall-back is in case we cannot inline the (what I'm assuming would be) a
> much bigger hot-path-only cloned function for whatever reason. What I'm
> assuming here is that after cold-region outlining, we may still have a
> large portion of the original function body to attempt to inline, whereas
> the pattern matching method will only contain a few basic blocks, giving a
> better chance to inline something.
>
>
> With profile data, the overhead of outlining a cold region can be
> estimated more accurately. (With the new PM), the threshold of inlining a
> hot callsite is also much higher. Without profile, the pattern matching
> method won't work too well in general even though it can enable more more
> inlining because the call overhead introduced to call the outlined function
> may outweigh the benefit of inlining the caller.
>
> What ever region that can be found by the pattern matching method
> should be identified by the new method as well. If there are multiple (but
> mutually exclusive) candidate regions found, the cost analysis heuristic
> should pick the best candidate region for outlining .
>
>
>
> For your (2) point, I think we'll have to be careful here. Without a
> sense of how 'likely' we're going to inline the new function, we'll have to
> make sure our outlining of cold regions will not degrade the performance of
> the function in 99.xx% of the cases, as it's unclear how much performance
> we'll gain from just outlining (without inlining to increase the odds of
> some performance gain). My initial thought was to ditch the new function
> and its outlined children if we cannot immediately inline it.
>
>
> The outlining only mode is useful to enable more aggressive inlining
> for the regular inlining pass. Slightly different heuristics can be applied
> here. For instance it can prefer largest candidate region (to maximiize the
> chance to inline the caller). The outlined region does not need to be super
> cold and leave it to the inliner to do more deeper analysis and decide to
> inline it right back in.
>
> David
>
>
>
>
> Graham Yiu
> LLVM Compiler Development
> IBM Toronto Software Lab
> Office: *(905) 413-4077* <(905)%20413-4077> C2-707/8200/Markham
> Email: *gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>
> [image: Inactive hide details for Xinliang David Li ---08/24/2017
> 03:05:06 PM---On Thu, Aug 24, 2017 at 10:40 AM, Graham Yiu <gyiu at ca.i]Xinliang
> David Li ---08/24/2017 03:05:06 PM---On Thu, Aug 24, 2017 at 10:40 AM,
> Graham Yiu <*gyiu at ca.ibm.com* <gyiu at ca.ibm.com>> wrote: > Hi David,
>
> From: Xinliang David Li <*xinliangli at gmail.com* <xinliangli at gmail.com>>
> To: Graham Yiu <*gyiu at ca.ibm.com* <gyiu at ca.ibm.com>>
> Cc: llvm-dev <*llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>>
> Date: 08/24/2017 03:05 PM
>
>
> Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a
> general outlining scheme for cold blocks
> ------------------------------
>
>
>
>
>
> On Thu, Aug 24, 2017 at 10:40 AM, Graham Yiu <*gyiu at ca.ibm.com*
> <gyiu at ca.ibm.com>> wrote:
> Hi David,
>
> So I've began doing some implementation on the outlining portion
> of the code. Currently, I got the partial inliner to outline cold regions
> (single entry, single exit) of the code, based solely on the existence of
> ProfileSummaryInfo (ie. profiling data). However, I have some concerns on
> how this will co-exist with the existing code that peels early returns.
>
> The control flow looks something like this:
>
> // New Code: find cold regions to outline
> if (!computeOutliningInfoForColdRegions()) {
> // If we can't find any cold regions, then fall-back to early
> return peeling
> if (!computeOutliningInfo) {
> return nullptr;
> }
> }
> // Try to outline the identified regions
> // Then try to inline the cloned function
>
> My concern is during inlining, if we fail to inline the cloned
> function, we give up and discard all cloned and outlined functions. But
> with these two types of outlining we're doing, it's possible to attempt to
> inline the cloned function that has outlined cold regions, and if we cannot
> do so, try to inline a different clone that has peeled early returns (ie.
> the way we have it today). This would require us to clone the original
> function twice and modify one based on cold region outlining and the other
> early return peeling, with the latter being our fall-back option if we fail
> to inline the first clone.
>
> What are your thoughts?
>
>
> I expect computeOutliningInfoForColdRegions can produce a super set
> of outlinable regions to the current 'pattern matching' approach. In other
> words, most of the cases currently caught by 'computeOutlineInfo' should be
> caught by the new algorithm, so why not ditching the current
> 'computeOutlningInfo' completely?
>
> My suggestion was to enhance the pass to 1) support outlining multiple
> regions; and 2) add a mode to do function outlining only (not the inlining
> part). The second is important can be used before the regular inliner
> pass. With the new pass manager and profile aware inlining, the inliner
> won't undo the outline decision, but in meantime becomes more powerful due
> to the reduced hot function size.
>
> David
>
>
> Graham Yiu
> LLVM Compiler Development
> IBM Toronto Software Lab
> Office: *(905) 413-4077* <(905)%20413-4077> C2-707/8200/Markham
> Email: *gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>
> [image: Inactive hide details for Graham Yiu---08/15/2017
> 08:04:28 PM---Hey David, Yes, we'll need to consider the effect on live
> range]Graham Yiu---08/15/2017 08:04:28 PM---Hey David, Yes,
> we'll need to consider the effect on live ranges for regions we want to
> outline. In
>
> From: Graham Yiu/Toronto/IBM
> To: Xinliang David Li <*xinliangli at gmail.com*
> <xinliangli at gmail.com>>
> Cc: llvm-dev <*llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>
> >
> Date: 08/15/2017 08:04 PM
> Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a
> general outlining scheme for cold blocks
> ------------------------------
>
>
> Hey David,
>
> Yes, we'll need to consider the effect on live ranges for
> regions we want to outline. In my experience, outlining live-exit regions
> seem to cause the most harm as we ruin chances to keep data in registers as
> you were alluding to. It's unclear, however, what the exact effect of
> outlining regions with live-entries would be.
>
> I'll probably try to avoid regions that are not single entry &
> single exit at least initially, to simplify the transformation and
> analysis. Are multi-exit regions common in your experience?
>
> And of course, I agree, we should reuse as much of the current
> partial inlining infrastructure as possible. I'll likely run some ideas by
> you as I begin to make changes.
>
> Cheers,
>
> Graham Yiu
> LLVM Compiler Development
> IBM Toronto Software Lab
> Office: *(905) 413-4077* <(905)%20413-4077> C2-407/8200/Markham
> Email: *gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>
>
> [image: Inactive hide details for Xinliang David Li
> ---08/15/2017 05:36:07 PM---Hi Graham, Making partial inlining more general
> is some]Xinliang David Li ---08/15/2017 05:36:07 PM---Hi Graham,
> Making partial inlining more general is something worth doing. Regarding
> your implementat
>
> From: Xinliang David Li <*xinliangli at gmail.com*
> <xinliangli at gmail.com>>
> To: Graham Yiu <*gyiu at ca.ibm.com* <gyiu at ca.ibm.com>>
> Cc: llvm-dev <*llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>
> >
> Date: 08/15/2017 05:36 PM
> Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a
> general outlining scheme for cold blocks
> ------------------------------
>
>
>
>
> Hi Graham, Making partial inlining more general is something
> worth doing. Regarding your implementation plan, I have some suggestions
> here:
>
> *) Function outlining introduces additional runtime cost:
> passing of live in values, returning of live out values (via memory), glue
> code in the caller to handle regions without a single exit block etc. The
> cost analysis needs to factor in those carefully
> *) Remove the limitation that there is only *one* outlined
> routine. Instead, the algorithm can compute multiple single-entry/single
> exit or single entry/multiple exit regions (cold ones) in the routine, and
> outline each region into its own function. The benefit include
> 1) simplify the design and implementation and most of the
> existing code can be reused;
> 2) provide more flexibility to allow most effective outlining;
> 3) reduced runtime overhead of making calls to the outline
> functions.
>
> thanks,
>
> David
>
> On Tue, Aug 15, 2017 at 11:22 AM, Graham Yiu via llvm-dev <
> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>> wrote:
> Hello,
>
> My team and I are looking to do some enhancements in
> the partial inliner in opt. Would appreciate any feedback that folks might
> have.
>
> # Partial Inlining in LLVM opt
>
> ## Summary
>
> ### Background
>
> Currently, the partial inliner searches the first
> few blocks of the callee and looks for a branch to the return block (ie.
> early return). If found, it attempts to outline the rest of the slow (or
> heavy) code so the inliner will be able to inline the fast (or light) code.
> If no early returns are found, the partial inliner will give up. As far as
> I can tell, BlockFrequency and BranchProbability information is only used
> when attempting to inline the early return code, and not used to determine
> whether to outline the slow code.
>
> ### Proposed changes
>
> In addition to looking for early returns, we should
> utilize profile information to outline blocks that are considered cold. If
> we can sufficiently reduce the size of the original function via this type
> of outlining, inlining should be able to inline the rest of the hot code.
>
> ## Details
>
> With the presence of profile information, we have a
> view of what code is infrequently executed and make better decisions on
> what to outline. Early return blocks that are infrequently executed should
> still be included as candidates for outlining, but will be treated just
> like any other cold blocks. Without profiling information, however, we
> should remain conservative and only partial inline in the presence of an
> early return in the first few blocks of a function (ie. peel the early
> return out of the function).
>
> To find cold regions to outline, we will traverse
> the CFG to find edges deemed 'cold' and look at the blocks dominated by the
> successor node. If, for some reason, that block has more than one
> predecessor, then we will skip this candidate as there should be a node
> that dominates this successor that has a single entry point. The last node
> in the dominance vector should also have a single successor. If it does
> not, then further investigation of the CFG is necessary to see when/how
> this situation occurs.
>
> We will need several heuristics to make sure we only
> outline in cases where we are confident it will result in a performance
> gain. Things such as threshold on when a branch is considered cold, the
> minimum number of times the predecessor node has to be executed in order
> for an edge to be considered (confidence factor), and the minimum size of
> the region to be outlined (can use inlining cost analysis like we currently
> do) will require some level of tuning.
>
> Similar to the current implementation, we will
> attempt to inline the leftover (hot) parts of the code, and if for some
> reason we cannot then we discard the modified function and its outlined
> code.
>
> ### Code changes
>
> The current Partial Inlining code first clones the
> function of interest and looks for a single set of blocks to outline. It
> then creates a function with the set the blocks, and saves the outlined
> function and outline callsite information as part of the function cloning
> container. In order to outline multiple regions of the function, we will
> need to change these containers to keep track of a list of regions to
> outline. We will also need to update the cost analysis to take into account
> multiple outlined functions.
>
> When a ProfileSummary is available, then we should
> skip the code that looks for early returns and go into new code that looks
> for cold regions to outline. When ProfileSummary is not available, then we
> can fall back to the existing code and look for early returns only.
>
> ### Tuning
>
> - The outlining heuristics will need to determine if
> a set of cold blocks is large enough to warrant the overhead of a function
> call. We also don't want the inliner to attempt to inline the outlined code
> later.
> - The threshold for determining whether a block is
> cold will also need to be tuned. In the case that profiling information is
> not accurate, we will pay the price of the additional call overhead for
> executing cold code.
> - The confidence factor, which can be viewed as the
> minimum number of times the predecessor has to be executed in order for an
> edge to be considered cold, should also be taken into account to avoid
> outlining code paths we have little information on.
>
> Graham Yiu
> LLVM Compiler Development
> IBM Toronto Software Lab
> Office: *(905) 413-4077* <(905)%20413-4077>
> C2-407/8200/Markham
> Email: *gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>
> _______________________________________________
> LLVM Developers mailing list
> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>
> *http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=4ST7e3kMd0GTi3w9ByK5Cw&m=rbfPPnRP9weVvtwCT5LyhMrn3TeP6-HaVUUkv-DHQ5I&s=0NPYoALj0vvVlLnq4AKtctnM_tHFxPY6SsX5mv2LMUE&e=>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>
> *http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=4ST7e3kMd0GTi3w9ByK5Cw&m=JmHSq8JxRpKOvzgsSuhWaAIzQgeYck1L-m_FSkgn7vw&s=-8UO_5yk7LKsuPAyNXhmaeaGemDfuTFOcbjt3SpjL7E&e=>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170829/06cc8cba/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170829/06cc8cba/attachment-0001.gif>
More information about the llvm-dev
mailing list