[llvm-dev] [RFC] Enhance Partial Inliner by using a general outlining scheme for cold blocks
Xinliang David Li via llvm-dev
llvm-dev at lists.llvm.org
Tue Aug 15 16:22:05 PDT 2017
On Tue, Aug 15, 2017 at 4:14 PM, River Riddle via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hey Graham,
> I worked on pretty much this exact thing last year. I did something
> similar to what you described, I traversed the CFG and built potentially
> profitable regions from any given valid start node. At that point there
> were several road blocks that prevented it from really going anywhere( like
> the inliner not preserving pgo data) but those problems should be gone by
> now.
> Some thoughts:
>
> : In terms of cost modeling, you will want to consider the interaction
> with loops:
> - Whats the impact of outlining from inside of a loop.
> - What if the hot region does not include the entry/exit, i.e hot loop.
>
> : Something that would also benefit, but would require more tuning, is the
> relationship between size and "hotness". If for example you have a function
> that looks like this:
>
> foo(bool B) {
> ... // Code that prevents early return.
>
> if(B) // RegionA : Taken 50% of the time.
> ... Very large amount of code.
>
> else // RegionB : Taken 50% of the time.
> ... Very small amount of code.
> }
>
> In the above you have two regions that aren't exactly cold. Outlining
> RegionA will make foo profitable to inline, but only if you take into
> account its large size. Using profile data as the sole cutoff will miss
> this opportunity.
>
>
Modelling the cost/benefit for this case can be very tricky though.
>
> : You may also want to look into hot switch statement splitting, though I
> don't know if it should really go into the partial inliner. This is
> splitting a switch statement into multiple switch statements if there are
> only a few cases that are actually hot.
>
> SWITCH1:
> switch(val) {
> HOT: ...
> COLD1: ...
> COLD2: ...
> ....
> COLDN: ...
> default: ...
> }
>
> would get transformed into:
>
>
> SWITCH1:
> switch(val) {
> HOT: ...
> default:
> br SWITCH2:
> }
>
> SWITCH2:
> switch(val) {
> COLD1: ...
> COLD2: ...
> ....
> COLDN: ...
> default: ...
> }
>
>
>
This is profile based switch peeling -- which is a separate optimization
from partial inlining.
David
> On Tue, Aug 15, 2017 at 1:14 PM, Graham Yiu via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi Jessica,
>>
>> Thanks for the feedback.
>>
>> I believe the existing partial inliner pass does use some common
>> utilities like the code extractor to do outlining.
>>
> AFAIK, Code extractor is the only IR level utility for outlining, but it's
> not the best and definitely needs some improvement. It currently returns
> outputs via live out pointer parameters, which means that outlined
> functions can't be tagged with attributes like readonly.
>
> There were also some known problems with extracting in the presence of
> critical edges, but I don't know if thats still the case.
>
>
>> Is that what you're referring to? I don't recall seeing any other passes
>> that has outlining other than the machine outliner, but I may have missed
>> something.
>>
>> I briefly looked at River's RFC and it seems he's mainly utilizing
>> outlining as a tool to reduce code size while maintaining code performance.
>>
>
> The main point of the outliner I proposed in that RFC is benefiting code
> size. The use of PGO data is to simply avoid detrimenting runtime
> performance.
>
>
>> Our proposal is to improve on the existing outlining scheme of the
>> partial inliner to give us more opportunities to inline hot code. We will
>> only do this type of general outlining with the presence of profiling or
>> user data (such as builtin_expect).
>>
> With that said, I am interested in how River plans to handle live ranges
>>
>
> It's a pretty difficult problem. I have estimation for register usage
> within an outlined region, using something similar to the register cost
> calculation in the loop vectorizer. I don't currently have anything for the
> actual call to the function itself. I haven't found a nice enough solution
> that can work for a large number of candidates.
>
> and heuristics he will use with profiling data available.
>>
> As I said above, the use of profile data, currently, is to avoid
> regressing execution time. So the heuristics are basically "Is this too hot
> to outline from". That outliner also has a completely different goal/cost
> model, which makes applying it to this problem difficult.
>
> Thanks,
> River Riddle
>
>
>>
>> I haven't looked at the machine outliner yet, but my understanding is it
>> also prioritizes code size reduction over performance gains. Is my
>> assumption correct? Do you think there's some tuning we can do in the
>> machine outliner to tailor for performance improvements?
>>
>> Graham Yiu
>> LLVM Compiler Development
>> IBM Toronto Software Lab
>> Office: (905) 413-4077 C2-407/8200/Markham
>> Email: gyiu at ca.ibm.com
>>
>> [image: Inactive hide details for Jessica Paquette ---08/15/2017 02:45:06
>> PM---Hi Graham, Have you looked at using the existing outline]Jessica
>> Paquette ---08/15/2017 02:45:06 PM---Hi Graham, Have you looked at using
>> the existing outliner framework to implement partial inlining? I
>>
>> From: Jessica Paquette <jpaquette at apple.com>
>> To: Graham Yiu <gyiu at ca.ibm.com>
>> Cc: llvm-dev at lists.llvm.org
>> Date: 08/15/2017 02:45 PM
>> Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a general
>> outlining scheme for cold blocks
>> Sent by: jpaquette at apple.com
>> ------------------------------
>>
>>
>>
>> Hi Graham,
>>
>> Have you looked at using the existing outliner framework to implement
>> partial inlining? In the past, people have combined outlining + inlining to
>> get partial inlining “for free” (see "Function outlining and partial
>> inlining” by Zhao and Amaral).
>>
>> There is the MachineOutliner. River Riddle has been working on an
>> IR-level outliner as well. The MachineOutliner doesn’t currently consider
>> the heat of a block, but might be worth experimenting with. It might be
>> good to experiment with River’s work and the existing outlining pass to see
>> if this functionality can effectively fall out of what we have already.
>>
>> Here’s the recent IR-level outlining thread for reference wrt River’s
>> work: *http://lists.llvm.org/pipermail/llvm-dev/2017-July/115666.html*
>> <http://lists.llvm.org/pipermail/llvm-dev/2017-July/115666.html>
>>
>> - Jessica
>>
>> On Aug 15, 2017, at 11:22 AM, Graham Yiu via llvm-dev <
>> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>> wrote:
>>
>> Hello,
>>
>> My team and I are looking to do some enhancements in the partial
>> inliner in opt. Would appreciate any feedback that folks might have.
>>
>> # Partial Inlining in LLVM opt
>>
>> ## Summary
>>
>> ### Background
>>
>> Currently, the partial inliner searches the first few blocks of
>> the callee and looks for a branch to the return block (ie. early return).
>> If found, it attempts to outline the rest of the slow (or heavy) code so
>> the inliner will be able to inline the fast (or light) code. If no early
>> returns are found, the partial inliner will give up. As far as I can tell,
>> BlockFrequency and BranchProbability information is only used when
>> attempting to inline the early return code, and not used to determine
>> whether to outline the slow code.
>>
>> ### Proposed changes
>>
>> In addition to looking for early returns, we should utilize
>> profile information to outline blocks that are considered cold. If we can
>> sufficiently reduce the size of the original function via this type of
>> outlining, inlining should be able to inline the rest of the hot code.
>>
>> ## Details
>>
>> With the presence of profile information, we have a view of what
>> code is infrequently executed and make better decisions on what to outline.
>> Early return blocks that are infrequently executed should still be included
>> as candidates for outlining, but will be treated just like any other cold
>> blocks. Without profiling information, however, we should remain
>> conservative and only partial inline in the presence of an early return in
>> the first few blocks of a function (ie. peel the early return out of the
>> function).
>>
>> To find cold regions to outline, we will traverse the CFG to find
>> edges deemed 'cold' and look at the blocks dominated by the successor node.
>> If, for some reason, that block has more than one predecessor, then we will
>> skip this candidate as there should be a node that dominates this successor
>> that has a single entry point. The last node in the dominance vector should
>> also have a single successor. If it does not, then further investigation of
>> the CFG is necessary to see when/how this situation occurs.
>>
>> We will need several heuristics to make sure we only outline in
>> cases where we are confident it will result in a performance gain. Things
>> such as threshold on when a branch is considered cold, the minimum number
>> of times the predecessor node has to be executed in order for an edge to be
>> considered (confidence factor), and the minimum size of the region to be
>> outlined (can use inlining cost analysis like we currently do) will require
>> some level of tuning.
>>
>> Similar to the current implementation, we will attempt to inline
>> the leftover (hot) parts of the code, and if for some reason we cannot then
>> we discard the modified function and its outlined code.
>>
>> ### Code changes
>>
>> The current Partial Inlining code first clones the function of
>> interest and looks for a single set of blocks to outline. It then creates a
>> function with the set the blocks, and saves the outlined function and
>> outline callsite information as part of the function cloning container. In
>> order to outline multiple regions of the function, we will need to change
>> these containers to keep track of a list of regions to outline. We will
>> also need to update the cost analysis to take into account multiple
>> outlined functions.
>>
>> When a ProfileSummary is available, then we should skip the code
>> that looks for early returns and go into new code that looks for cold
>> regions to outline. When ProfileSummary is not available, then we can fall
>> back to the existing code and look for early returns only.
>>
>> ### Tuning
>>
>> - The outlining heuristics will need to determine if a set of cold
>> blocks is large enough to warrant the overhead of a function call. We also
>> don't want the inliner to attempt to inline the outlined code later.
>> - The threshold for determining whether a block is cold will also
>> need to be tuned. In the case that profiling information is not accurate,
>> we will pay the price of the additional call overhead for executing cold
>> code.
>> - The confidence factor, which can be viewed as the minimum number
>> of times the predecessor has to be executed in order for an edge to be
>> considered cold, should also be taken into account to avoid outlining code
>> paths we have little information on.
>>
>> Graham Yiu
>> LLVM Compiler Development
>> IBM Toronto Software Lab
>> Office: (905) 413-4077 C2-407/8200/Markham
>> *Email: gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170815/9af62420/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170815/9af62420/attachment-0001.gif>
More information about the llvm-dev
mailing list