[llvm-dev] [RFC] Enhance Partial Inliner by using a general outlining scheme for cold blocks

Tue Aug 15 16:22:05 PDT 2017

On Tue, Aug 15, 2017 at 4:14 PM, River Riddle via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hey Graham,
>  I worked on pretty much this exact thing last year.  I did something
> similar to what you described, I traversed the CFG and built potentially
> profitable regions from any given valid start node. At that point there
> were several road blocks that prevented it from really going anywhere( like
> the inliner not preserving pgo data) but those problems should be gone by
> now.
> Some thoughts:
>
> : In terms of cost modeling, you will want to consider the interaction
> with loops:
>  - Whats the impact of outlining from inside of a loop.
>  - What if the hot region does not include the entry/exit, i.e hot loop.
>
> : Something that would also benefit, but would require more tuning, is the
> relationship between size and "hotness". If for example you have a function
> that looks like this:
>
> foo(bool B) {
> ... // Code that prevents early return.
>
> if(B) // RegionA : Taken 50% of the time.
>   ... Very large amount of code.
>
> else // RegionB : Taken 50% of the time.
>   ... Very small amount of code.
> }
>
> In the above you have two regions that aren't exactly cold. Outlining
> RegionA will make foo profitable to inline, but only if you take into
> account its large size. Using profile data as the sole cutoff will miss
> this opportunity.
>
>

Modelling the cost/benefit for this case can be very tricky though.

>
> : You may also want to look into hot switch statement splitting, though I
> don't know if it should really go into the partial inliner. This is
> splitting a switch statement into multiple switch statements if there are
> only a few cases that are actually hot.
>
> SWITCH1:
> switch(val) {
>   HOT: ...
>   COLD1: ...
>   COLD2: ...
>   ....
>   COLDN: ...
>   default: ...
> }
>
> would get transformed into:
>

>
> SWITCH1:
> switch(val) {
>  HOT: ...
>  default:
>    br SWITCH2:
> }
>
> SWITCH2:
> switch(val) {
>   COLD1: ...
>   COLD2: ...
>   ....
>   COLDN: ...
>   default: ...
> }
>
>
>

This is profile based switch peeling -- which is a separate optimization
from partial inlining.

David

> On Tue, Aug 15, 2017 at 1:14 PM, Graham Yiu via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi Jessica,
>>
>> Thanks for the feedback.
>>
>> I believe the existing partial inliner pass does use some common
>> utilities like the code extractor to do outlining.
>>
> AFAIK, Code extractor is the only IR level utility for outlining, but it's
> not the best and definitely needs some improvement. It currently returns
> outputs via live out pointer parameters, which means that outlined
> functions can't be tagged with attributes like readonly.
>
> There were also some known problems with extracting in the presence of
> critical edges, but I don't know if thats still the case.
>
>
>> Is that what you're referring to? I don't recall seeing any other passes
>> that has outlining other than the machine outliner, but I may have missed
>> something.
>>
>> I briefly looked at River's RFC and it seems he's mainly utilizing
>> outlining as a tool to reduce code size while maintaining code performance.
>>
>
> The main point of the outliner I proposed in that RFC is benefiting code
> size. The use of PGO data is to simply avoid detrimenting runtime
> performance.
>
>
>> Our proposal is to improve on the existing outlining scheme of the
>> partial inliner to give us more opportunities to inline hot code. We will
>> only do this type of general outlining with the presence of profiling or
>> user data (such as builtin_expect).
>>
> With that said, I am interested in how River plans to handle live ranges
>>
>
>  It's a pretty difficult problem. I have estimation for register usage
> within an outlined region, using something similar to the register cost
> calculation in the loop vectorizer. I don't currently have anything for the
> actual call to the function itself. I haven't found a nice enough solution
> that can work for a large number of candidates.
>
> and heuristics he will use with profiling data available.
>>
> As I said above, the use of profile data, currently, is to avoid
> regressing execution time. So the heuristics are basically "Is this too hot
> to outline from". That outliner also has a completely different goal/cost
> model, which makes applying it to this problem difficult.
>
> Thanks,
>  River Riddle
>
>
>>
>> I haven't looked at the machine outliner yet, but my understanding is it
>> also prioritizes code size reduction over performance gains. Is my
>> assumption correct? Do you think there's some tuning we can do in the
>> machine outliner to tailor for performance improvements?
>>
>> Graham Yiu
>> LLVM Compiler Development
>> IBM Toronto Software Lab
>> Office: (905) 413-4077 C2-407/8200/Markham
>> Email: gyiu at ca.ibm.com
>>
>> [image: Inactive hide details for Jessica Paquette ---08/15/2017 02:45:06
>> PM---Hi Graham, Have you looked at using the existing outline]Jessica
>> Paquette ---08/15/2017 02:45:06 PM---Hi Graham, Have you looked at using
>> the existing outliner framework to implement partial inlining? I
>>
>> From: Jessica Paquette <jpaquette at apple.com>
>> To: Graham Yiu <gyiu at ca.ibm.com>
>> Cc: llvm-dev at lists.llvm.org
>> Date: 08/15/2017 02:45 PM
>> Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a general
>> outlining scheme for cold blocks
>> Sent by: jpaquette at apple.com
>> ------------------------------
>>
>>
>>
>> Hi Graham,
>>
>> Have you looked at using the existing outliner framework to implement
>> partial inlining? In the past, people have combined outlining + inlining to
>> get partial inlining “for free” (see "Function outlining and partial
>> inlining” by Zhao and Amaral).
>>
>> There is the MachineOutliner. River Riddle has been working on an
>> IR-level outliner as well. The MachineOutliner doesn’t currently consider
>> the heat of a block, but might be worth experimenting with. It might be
>> good to experiment with River’s work and the existing outlining pass to see
>> if this functionality can effectively fall out of what we have already.
>>
>> Here’s the recent IR-level outlining thread for reference wrt River’s
>> work: *http://lists.llvm.org/pipermail/llvm-dev/2017-July/115666.html*
>> <http://lists.llvm.org/pipermail/llvm-dev/2017-July/115666.html>
>>
>> - Jessica
>>
>>    On Aug 15, 2017, at 11:22 AM, Graham Yiu via llvm-dev <
>>       *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>> wrote:
>>
>>       Hello,
>>
>>       My team and I are looking to do some enhancements in the partial
>>       inliner in opt. Would appreciate any feedback that folks might have.
>>
>>       # Partial Inlining in LLVM opt
>>
>>       ## Summary
>>
>>       ### Background
>>
>>       Currently, the partial inliner searches the first few blocks of
>>       the callee and looks for a branch to the return block (ie. early return).
>>       If found, it attempts to outline the rest of the slow (or heavy) code so
>>       the inliner will be able to inline the fast (or light) code. If no early
>>       returns are found, the partial inliner will give up. As far as I can tell,
>>       BlockFrequency and BranchProbability information is only used when
>>       attempting to inline the early return code, and not used to determine
>>       whether to outline the slow code.
>>
>>       ### Proposed changes
>>
>>       In addition to looking for early returns, we should utilize
>>       profile information to outline blocks that are considered cold. If we can
>>       sufficiently reduce the size of the original function via this type of
>>       outlining, inlining should be able to inline the rest of the hot code.
>>
>>       ## Details
>>
>>       With the presence of profile information, we have a view of what
>>       code is infrequently executed and make better decisions on what to outline.
>>       Early return blocks that are infrequently executed should still be included
>>       as candidates for outlining, but will be treated just like any other cold
>>       blocks. Without profiling information, however, we should remain
>>       conservative and only partial inline in the presence of an early return in
>>       the first few blocks of a function (ie. peel the early return out of the
>>       function).
>>
>>       To find cold regions to outline, we will traverse the CFG to find
>>       edges deemed 'cold' and look at the blocks dominated by the successor node.
>>       If, for some reason, that block has more than one predecessor, then we will
>>       skip this candidate as there should be a node that dominates this successor
>>       that has a single entry point. The last node in the dominance vector should
>>       also have a single successor. If it does not, then further investigation of
>>       the CFG is necessary to see when/how this situation occurs.
>>
>>       We will need several heuristics to make sure we only outline in
>>       cases where we are confident it will result in a performance gain. Things
>>       such as threshold on when a branch is considered cold, the minimum number
>>       of times the predecessor node has to be executed in order for an edge to be
>>       considered (confidence factor), and the minimum size of the region to be
>>       outlined (can use inlining cost analysis like we currently do) will require
>>       some level of tuning.
>>
>>       Similar to the current implementation, we will attempt to inline
>>       the leftover (hot) parts of the code, and if for some reason we cannot then
>>       we discard the modified function and its outlined code.
>>
>>       ### Code changes
>>
>>       The current Partial Inlining code first clones the function of
>>       interest and looks for a single set of blocks to outline. It then creates a
>>       function with the set the blocks, and saves the outlined function and
>>       outline callsite information as part of the function cloning container. In
>>       order to outline multiple regions of the function, we will need to change
>>       these containers to keep track of a list of regions to outline. We will
>>       also need to update the cost analysis to take into account multiple
>>       outlined functions.
>>
>>       When a ProfileSummary is available, then we should skip the code
>>       that looks for early returns and go into new code that looks for cold
>>       regions to outline. When ProfileSummary is not available, then we can fall
>>       back to the existing code and look for early returns only.
>>
>>       ### Tuning
>>
>>       - The outlining heuristics will need to determine if a set of cold
>>       blocks is large enough to warrant the overhead of a function call. We also
>>       don't want the inliner to attempt to inline the outlined code later.
>>       - The threshold for determining whether a block is cold will also
>>       need to be tuned. In the case that profiling information is not accurate,
>>       we will pay the price of the additional call overhead for executing cold
>>       code.
>>       - The confidence factor, which can be viewed as the minimum number
>>       of times the predecessor has to be executed in order for an edge to be
>>       considered cold, should also be taken into account to avoid outlining code
>>       paths we have little information on.
>>
>>       Graham Yiu
>>       LLVM Compiler Development
>>       IBM Toronto Software Lab
>>       Office: (905) 413-4077 C2-407/8200/Markham
>> *Email: gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>>
>>       _______________________________________________
>>       LLVM Developers mailing list
>> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>
>>       http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170815/9af62420/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170815/9af62420/attachment-0001.gif>