[llvm-dev] [RFC] Enhance Partial Inliner by using a general outlining scheme for cold blocks
River Riddle via llvm-dev
llvm-dev at lists.llvm.org
Tue Aug 15 16:26:59 PDT 2017
On Tue, Aug 15, 2017 at 4:22 PM Xinliang David Li <xinliangli at gmail.com>
wrote:
> On Tue, Aug 15, 2017 at 4:14 PM, River Riddle via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hey Graham,
>> I worked on pretty much this exact thing last year. I did something
>> similar to what you described, I traversed the CFG and built potentially
>> profitable regions from any given valid start node. At that point there
>> were several road blocks that prevented it from really going anywhere( like
>> the inliner not preserving pgo data) but those problems should be gone by
>> now.
>> Some thoughts:
>>
>> : In terms of cost modeling, you will want to consider the interaction
>> with loops:
>> - Whats the impact of outlining from inside of a loop.
>> - What if the hot region does not include the entry/exit, i.e hot loop.
>>
>> : Something that would also benefit, but would require more tuning, is
>> the relationship between size and "hotness". If for example you have a
>> function that looks like this:
>>
>> foo(bool B) {
>> ... // Code that prevents early return.
>>
>> if(B) // RegionA : Taken 50% of the time.
>> ... Very large amount of code.
>>
>> else // RegionB : Taken 50% of the time.
>> ... Very small amount of code.
>> }
>>
>> In the above you have two regions that aren't exactly cold. Outlining
>> RegionA will make foo profitable to inline, but only if you take into
>> account its large size. Using profile data as the sole cutoff will miss
>> this opportunity.
>>
>>
>
> Modelling the cost/benefit for this case can be very tricky though.
>
Most definitely, but it's something to think about when moving forward.
>
>
>>
>> : You may also want to look into hot switch statement splitting, though I
>> don't know if it should really go into the partial inliner. This is
>> splitting a switch statement into multiple switch statements if there are
>> only a few cases that are actually hot.
>>
>> SWITCH1:
>> switch(val) {
>> HOT: ...
>> COLD1: ...
>> COLD2: ...
>> ....
>> COLDN: ...
>> default: ...
>> }
>>
>> would get transformed into:
>>
>
>
>>
>> SWITCH1:
>> switch(val) {
>> HOT: ...
>> default:
>> br SWITCH2:
>> }
>>
>> SWITCH2:
>> switch(val) {
>> COLD1: ...
>> COLD2: ...
>> ....
>> COLDN: ...
>> default: ...
>> }
>>
>>
>>
>
> This is profile based switch peeling -- which is a separate optimization
> from partial inlining.
>
> David
>
>
Ah thanks, I wasn't sure of the name.
River Riddle
>
>
>
>> On Tue, Aug 15, 2017 at 1:14 PM, Graham Yiu via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi Jessica,
>>>
>>> Thanks for the feedback.
>>>
>>> I believe the existing partial inliner pass does use some common
>>> utilities like the code extractor to do outlining.
>>>
>> AFAIK, Code extractor is the only IR level utility for outlining, but
>> it's not the best and definitely needs some improvement. It currently
>> returns outputs via live out pointer parameters, which means that outlined
>> functions can't be tagged with attributes like readonly.
>>
>> There were also some known problems with extracting in the presence of
>> critical edges, but I don't know if thats still the case.
>>
>>
>>> Is that what you're referring to? I don't recall seeing any other passes
>>> that has outlining other than the machine outliner, but I may have missed
>>> something.
>>>
>>> I briefly looked at River's RFC and it seems he's mainly utilizing
>>> outlining as a tool to reduce code size while maintaining code performance.
>>>
>>
>> The main point of the outliner I proposed in that RFC is benefiting code
>> size. The use of PGO data is to simply avoid detrimenting runtime
>> performance.
>>
>>
>>> Our proposal is to improve on the existing outlining scheme of the
>>> partial inliner to give us more opportunities to inline hot code. We will
>>> only do this type of general outlining with the presence of profiling or
>>> user data (such as builtin_expect).
>>>
>> With that said, I am interested in how River plans to handle live ranges
>>>
>>
>> It's a pretty difficult problem. I have estimation for register usage
>> within an outlined region, using something similar to the register cost
>> calculation in the loop vectorizer. I don't currently have anything for the
>> actual call to the function itself. I haven't found a nice enough solution
>> that can work for a large number of candidates.
>>
>> and heuristics he will use with profiling data available.
>>>
>> As I said above, the use of profile data, currently, is to avoid
>> regressing execution time. So the heuristics are basically "Is this too hot
>> to outline from". That outliner also has a completely different goal/cost
>> model, which makes applying it to this problem difficult.
>>
>> Thanks,
>> River Riddle
>>
>>
>>>
>>> I haven't looked at the machine outliner yet, but my understanding is it
>>> also prioritizes code size reduction over performance gains. Is my
>>> assumption correct? Do you think there's some tuning we can do in the
>>> machine outliner to tailor for performance improvements?
>>>
>>> Graham Yiu
>>> LLVM Compiler Development
>>> IBM Toronto Software Lab
>>> Office: (905) 413-4077 C2-407/8200/Markham
>>> Email: gyiu at ca.ibm.com
>>>
>>> [image: Inactive hide details for Jessica Paquette ---08/15/2017
>>> 02:45:06 PM---Hi Graham, Have you looked at using the existing outline]Jessica
>>> Paquette ---08/15/2017 02:45:06 PM---Hi Graham, Have you looked at using
>>> the existing outliner framework to implement partial inlining? I
>>>
>>> From: Jessica Paquette <jpaquette at apple.com>
>>> To: Graham Yiu <gyiu at ca.ibm.com>
>>> Cc: llvm-dev at lists.llvm.org
>>> Date: 08/15/2017 02:45 PM
>>> Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a
>>> general outlining scheme for cold blocks
>>> Sent by: jpaquette at apple.com
>>> ------------------------------
>>>
>>>
>>>
>>> Hi Graham,
>>>
>>> Have you looked at using the existing outliner framework to implement
>>> partial inlining? In the past, people have combined outlining + inlining to
>>> get partial inlining “for free” (see "Function outlining and partial
>>> inlining” by Zhao and Amaral).
>>>
>>> There is the MachineOutliner. River Riddle has been working on an
>>> IR-level outliner as well. The MachineOutliner doesn’t currently consider
>>> the heat of a block, but might be worth experimenting with. It might be
>>> good to experiment with River’s work and the existing outlining pass to see
>>> if this functionality can effectively fall out of what we have already.
>>>
>>> Here’s the recent IR-level outlining thread for reference wrt River’s
>>> work: *http://lists.llvm.org/pipermail/llvm-dev/2017-July/115666.html*
>>> <http://lists.llvm.org/pipermail/llvm-dev/2017-July/115666.html>
>>>
>>> - Jessica
>>>
>>> On Aug 15, 2017, at 11:22 AM, Graham Yiu via llvm-dev <
>>> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>> wrote:
>>>
>>> Hello,
>>>
>>> My team and I are looking to do some enhancements in the partial
>>> inliner in opt. Would appreciate any feedback that folks might have.
>>>
>>> # Partial Inlining in LLVM opt
>>>
>>> ## Summary
>>>
>>> ### Background
>>>
>>> Currently, the partial inliner searches the first few blocks of
>>> the callee and looks for a branch to the return block (ie. early return).
>>> If found, it attempts to outline the rest of the slow (or heavy) code so
>>> the inliner will be able to inline the fast (or light) code. If no early
>>> returns are found, the partial inliner will give up. As far as I can tell,
>>> BlockFrequency and BranchProbability information is only used when
>>> attempting to inline the early return code, and not used to determine
>>> whether to outline the slow code.
>>>
>>> ### Proposed changes
>>>
>>> In addition to looking for early returns, we should utilize
>>> profile information to outline blocks that are considered cold. If we can
>>> sufficiently reduce the size of the original function via this type of
>>> outlining, inlining should be able to inline the rest of the hot code.
>>>
>>> ## Details
>>>
>>> With the presence of profile information, we have a view of what
>>> code is infrequently executed and make better decisions on what to outline.
>>> Early return blocks that are infrequently executed should still be included
>>> as candidates for outlining, but will be treated just like any other cold
>>> blocks. Without profiling information, however, we should remain
>>> conservative and only partial inline in the presence of an early return in
>>> the first few blocks of a function (ie. peel the early return out of the
>>> function).
>>>
>>> To find cold regions to outline, we will traverse the CFG to find
>>> edges deemed 'cold' and look at the blocks dominated by the successor node.
>>> If, for some reason, that block has more than one predecessor, then we will
>>> skip this candidate as there should be a node that dominates this successor
>>> that has a single entry point. The last node in the dominance vector should
>>> also have a single successor. If it does not, then further investigation of
>>> the CFG is necessary to see when/how this situation occurs.
>>>
>>> We will need several heuristics to make sure we only outline in
>>> cases where we are confident it will result in a performance gain. Things
>>> such as threshold on when a branch is considered cold, the minimum number
>>> of times the predecessor node has to be executed in order for an edge to be
>>> considered (confidence factor), and the minimum size of the region to be
>>> outlined (can use inlining cost analysis like we currently do) will require
>>> some level of tuning.
>>>
>>> Similar to the current implementation, we will attempt to inline
>>> the leftover (hot) parts of the code, and if for some reason we cannot then
>>> we discard the modified function and its outlined code.
>>>
>>> ### Code changes
>>>
>>> The current Partial Inlining code first clones the function of
>>> interest and looks for a single set of blocks to outline. It then creates a
>>> function with the set the blocks, and saves the outlined function and
>>> outline callsite information as part of the function cloning container. In
>>> order to outline multiple regions of the function, we will need to change
>>> these containers to keep track of a list of regions to outline. We will
>>> also need to update the cost analysis to take into account multiple
>>> outlined functions.
>>>
>>> When a ProfileSummary is available, then we should skip the code
>>> that looks for early returns and go into new code that looks for cold
>>> regions to outline. When ProfileSummary is not available, then we can fall
>>> back to the existing code and look for early returns only.
>>>
>>> ### Tuning
>>>
>>> - The outlining heuristics will need to determine if a set of
>>> cold blocks is large enough to warrant the overhead of a function call. We
>>> also don't want the inliner to attempt to inline the outlined code later.
>>> - The threshold for determining whether a block is cold will also
>>> need to be tuned. In the case that profiling information is not accurate,
>>> we will pay the price of the additional call overhead for executing cold
>>> code.
>>> - The confidence factor, which can be viewed as the minimum
>>> number of times the predecessor has to be executed in order for an edge to
>>> be considered cold, should also be taken into account to avoid outlining
>>> code paths we have little information on.
>>>
>>> Graham Yiu
>>> LLVM Compiler Development
>>> IBM Toronto Software Lab
>>> Office: (905) 413-4077 C2-407/8200/Markham
>>> *Email: gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> --
Thank you,
River Riddle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170815/3a83b212/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170815/3a83b212/attachment.gif>
More information about the llvm-dev
mailing list