[llvm-dev] [RFC] Enhance Partial Inliner by using a general outlining scheme for cold blocks

Tue Aug 29 10:52:44 PDT 2017

On Tue, Aug 29, 2017 at 10:24 AM, Graham Yiu via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi Kader,
>
> I agree with you, if we were going to only do outlining for some functions
> and not immediately attempt to inline, it should be an independent pass.
> The partial inliner should do what its name suggests and attempt to inline
> or bail.
>
> I haven't looked through the CodeExtractor at all. I imagine I'll have to
> go through it at some point. I'd also be interested in something that does
> an analysis before code extraction that tells me how many live ranges I'm
> going to be killing or how many symbols I'm going to be taking the address
> of by extracting a specific region of code. Not sure if that currently
> exists.
>

CodeExtractor uses CodeExtractor::findInputsOutputs to determine the
inputs/outputs into the outlined region. It's a public utility and should
be roughly what you are looking for. It is used in conjunction with
CodeExtractor::findAllocas as it will try to hoist allocas that are only
used within the outlined region. Just look at
CodeExtractor::extractCodeRegion to see how they interact together. You
don't have to use them together, but you will lose information on some
unnecessary inputs.

Thanks,
 River Riddle

>
>
> Graham Yiu
> LLVM Compiler Development
> IBM Toronto Software Lab
> Office: (905) 413-4077 C2-707/8200/Markham
> Email: gyiu at ca.ibm.com
>
> [image: Inactive hide details for keita abdoul-kader ---08/29/2017
> 12:15:25 PM---I second the fact that a way to outline specific funct]keita
> abdoul-kader ---08/29/2017 12:15:25 PM---I second the fact that a way to
> outline specific function regions independently of the partial inlin
>
> From: keita abdoul-kader <abdoulk.keita at gmail.com>
> To: Xinliang David Li <xinliangli at gmail.com>
> Cc: Graham Yiu <gyiu at ca.ibm.com>, llvm-dev <llvm-dev at lists.llvm.org>
> Date: 08/29/2017 12:15 PM
>
> Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a general
> outlining scheme for cold blocks
> ------------------------------
>
>
>
> I second the fact that a way to outline specific function regions
> independently of the partial inliner sound very useful. I am not sure
> however if we would want a mode within the partialInliner or something
> completely independent.
>
> As a general question,   does anybody has a clear idea of what are the
> constraints on the region CodeExtractor is currently able to handle ?
> Going through the code, it looks like the only requirement is for the
> header to dominate all the BB in the region ;
>
> On Sat, Aug 26, 2017 at 9:52 AM, Xinliang David Li via llvm-dev <
> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>> wrote:
>
>
>
>    On Thu, Aug 24, 2017 at 12:47 PM, Graham Yiu <*gyiu at ca.ibm.com*
>    <gyiu at ca.ibm.com>> wrote:
>    Hi David,
>
>    The only reason I can see to use the 'pattern matching' part as a
>    fall-back is in case we cannot inline the (what I'm assuming would be) a
>    much bigger hot-path-only cloned function for whatever reason. What I'm
>    assuming here is that after cold-region outlining, we may still have a
>    large portion of the original function body to attempt to inline, whereas
>    the pattern matching method will only contain a few basic blocks, giving a
>    better chance to inline something.
>
>
>    With profile data, the overhead of outlining a cold region can be
>    estimated more accurately. (With the new PM), the threshold of inlining a
>    hot callsite is also much higher. Without profile, the pattern matching
>    method won't work too well in general even though it can enable more more
>    inlining because the call overhead introduced to call the outlined function
>    may outweigh the benefit of inlining the caller.
>
>    What ever region that can be found by the pattern matching method
>    should be identified by the new method as well. If there are multiple (but
>    mutually exclusive) candidate regions found, the cost analysis heuristic
>    should pick the best candidate region for outlining .
>
>
>
>    For your (2) point, I think we'll have to be careful here. Without a
>    sense of how 'likely' we're going to inline the new function, we'll have to
>    make sure our outlining of cold regions will not degrade the performance of
>    the function in 99.xx% of the cases, as it's unclear how much performance
>    we'll gain from just outlining (without inlining to increase the odds of
>    some performance gain). My initial thought was to ditch the new function
>    and its outlined children if we cannot immediately inline it.
>
>
>    The outlining only mode is useful to enable more aggressive inlining
>    for the regular inlining pass. Slightly different heuristics can be applied
>    here. For instance it can prefer largest candidate region (to maximiize the
>    chance to inline the caller). The outlined region does not need to be super
>    cold and leave it to the inliner to do more deeper analysis and decide to
>    inline it right back in.
>
>    David
>
>
>
>
>    Graham Yiu
>    LLVM Compiler Development
>    IBM Toronto Software Lab
>    Office: *(905) 413-4077* <(905)%20413-4077> C2-707/8200/Markham
>    Email: *gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>
>    [image: Inactive hide details for Xinliang David Li ---08/24/2017
>    03:05:06 PM---On Thu, Aug 24, 2017 at 10:40 AM, Graham Yiu <gyiu at ca.i]Xinliang
>    David Li ---08/24/2017 03:05:06 PM---On Thu, Aug 24, 2017 at 10:40 AM,
>    Graham Yiu <*gyiu at ca.ibm.com* <gyiu at ca.ibm.com>> wrote: > Hi David,
>
>    From: Xinliang David Li <*xinliangli at gmail.com* <xinliangli at gmail.com>>
>    To: Graham Yiu <*gyiu at ca.ibm.com* <gyiu at ca.ibm.com>>
>    Cc: llvm-dev <*llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>>
>    Date: 08/24/2017 03:05 PM
>
>
>    Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a
>    general outlining scheme for cold blocks
>    ------------------------------
>
>
>
>
>
>    On Thu, Aug 24, 2017 at 10:40 AM, Graham Yiu <*gyiu at ca.ibm.com*
>    <gyiu at ca.ibm.com>> wrote:
>       Hi David,
>
>          So I've began doing some implementation on the outlining portion
>          of the code. Currently, I got the partial inliner to outline cold regions
>          (single entry, single exit) of the code, based solely on the existence of
>          ProfileSummaryInfo (ie. profiling data). However, I have some concerns on
>          how this will co-exist with the existing code that peels early returns.
>
>          The control flow looks something like this:
>
>          // New Code: find cold regions to outline
>          if (!computeOutliningInfoForColdRegions()) {
>          // If we can't find any cold regions, then fall-back to early
>          return peeling
>          if (!computeOutliningInfo) {
>          return nullptr;
>          }
>          }
>          // Try to outline the identified regions
>          // Then try to inline the cloned function
>
>          My concern is during inlining, if we fail to inline the cloned
>          function, we give up and discard all cloned and outlined functions. But
>          with these two types of outlining we're doing, it's possible to attempt to
>          inline the cloned function that has outlined cold regions, and if we cannot
>          do so, try to inline a different clone that has peeled early returns (ie.
>          the way we have it today). This would require us to clone the original
>          function twice and modify one based on cold region outlining and the other
>          early return peeling, with the latter being our fall-back option if we fail
>          to inline the first clone.
>
>          What are your thoughts?
>
>
>    I expect  computeOutliningInfoForColdRegions can produce a super set
>    of outlinable regions to the current 'pattern matching' approach. In other
>    words, most of the cases currently caught by 'computeOutlineInfo' should be
>    caught by the new algorithm, so why not ditching the current
>    'computeOutlningInfo' completely?
>
>    My suggestion was to enhance the pass to 1) support outlining multiple
>    regions; and 2) add a mode to do function outlining only (not the inlining
>    part).  The second is important can be used before the regular inliner
>    pass.   With the new pass manager and profile aware inlining, the inliner
>    won't undo the outline decision, but in meantime becomes more powerful due
>    to the reduced hot function size.
>
>    David
>
>
>       Graham Yiu
>          LLVM Compiler Development
>          IBM Toronto Software Lab
>          Office: *(905) 413-4077* <(905)%20413-4077> C2-707/8200/Markham
>          Email: *gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>
>          [image: Inactive hide details for Graham Yiu---08/15/2017
>          08:04:28 PM---Hey David, Yes, we'll need to consider the effect on live
>          range]Graham Yiu---08/15/2017 08:04:28 PM---Hey David, Yes,
>          we'll need to consider the effect on live ranges for regions we want to
>          outline. In
>
>          From: Graham Yiu/Toronto/IBM
>          To: Xinliang David Li <*xinliangli at gmail.com*
>          <xinliangli at gmail.com>>
>          Cc: llvm-dev <*llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>
>          >
>          Date: 08/15/2017 08:04 PM
>          Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a
>          general outlining scheme for cold blocks
>          ------------------------------
>
>
>          Hey David,
>
>          Yes, we'll need to consider the effect on live ranges for
>          regions we want to outline. In my experience, outlining live-exit regions
>          seem to cause the most harm as we ruin chances to keep data in registers as
>          you were alluding to. It's unclear, however, what the exact effect of
>          outlining regions with live-entries would be.
>
>          I'll probably try to avoid regions that are not single entry &
>          single exit at least initially, to simplify the transformation and
>          analysis. Are multi-exit regions common in your experience?
>
>          And of course, I agree, we should reuse as much of the current
>          partial inlining infrastructure as possible. I'll likely run some ideas by
>          you as I begin to make changes.
>
>          Cheers,
>
>          Graham Yiu
>          LLVM Compiler Development
>          IBM Toronto Software Lab
>          Office: *(905) 413-4077* <(905)%20413-4077> C2-407/8200/Markham
>          Email: *gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>
>
>          [image: Inactive hide details for Xinliang David Li
>          ---08/15/2017 05:36:07 PM---Hi Graham, Making partial inlining more general
>          is some]Xinliang David Li ---08/15/2017 05:36:07 PM---Hi Graham,
>          Making partial inlining more general is something worth doing. Regarding
>          your implementat
>
>          From: Xinliang David Li <*xinliangli at gmail.com*
>          <xinliangli at gmail.com>>
>          To: Graham Yiu <*gyiu at ca.ibm.com* <gyiu at ca.ibm.com>>
>          Cc: llvm-dev <*llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>
>          >
>          Date: 08/15/2017 05:36 PM
>          Subject: Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a
>          general outlining scheme for cold blocks
>          ------------------------------
>
>
>
>
>          Hi Graham, Making partial inlining more general is something
>          worth doing.  Regarding your implementation plan, I have some suggestions
>          here:
>
>          *) Function outlining introduces additional runtime cost:
>          passing of live in values, returning of live out values (via memory), glue
>          code in the caller to handle regions without a single exit block etc.  The
>          cost analysis needs to factor in those carefully
>          *) Remove the limitation that there is only *one* outlined
>          routine. Instead, the algorithm can compute multiple single-entry/single
>          exit or single entry/multiple exit regions (cold ones) in the routine, and
>          outline each region into its own function. The benefit include
>             1) simplify the design and implementation and most of the
>          existing code can be reused;
>             2) provide more flexibility to allow most effective outlining;
>             3) reduced runtime overhead of making calls to the outline
>          functions.
>
>          thanks,
>
>          David
>
>          On Tue, Aug 15, 2017 at 11:22 AM, Graham Yiu via llvm-dev <
>          *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>> wrote:
>             Hello,
>
>                      My team and I are looking to do some enhancements in
>                      the partial inliner in opt. Would appreciate any feedback that folks might
>                      have.
>
>                      # Partial Inlining in LLVM opt
>
>                      ## Summary
>
>                      ### Background
>
>                      Currently, the partial inliner searches the first
>                      few blocks of the callee and looks for a branch to the return block (ie.
>                      early return). If found, it attempts to outline the rest of the slow (or
>                      heavy) code so the inliner will be able to inline the fast (or light) code.
>                      If no early returns are found, the partial inliner will give up. As far as
>                      I can tell, BlockFrequency and BranchProbability information is only used
>                      when attempting to inline the early return code, and not used to determine
>                      whether to outline the slow code.
>
>                      ### Proposed changes
>
>                      In addition to looking for early returns, we should
>                      utilize profile information to outline blocks that are considered cold. If
>                      we can sufficiently reduce the size of the original function via this type
>                      of outlining, inlining should be able to inline the rest of the hot code.
>
>                      ## Details
>
>                      With the presence of profile information, we have a
>                      view of what code is infrequently executed and make better decisions on
>                      what to outline. Early return blocks that are infrequently executed should
>                      still be included as candidates for outlining, but will be treated just
>                      like any other cold blocks. Without profiling information, however, we
>                      should remain conservative and only partial inline in the presence of an
>                      early return in the first few blocks of a function (ie. peel the early
>                      return out of the function).
>
>                      To find cold regions to outline, we will traverse
>                      the CFG to find edges deemed 'cold' and look at the blocks dominated by the
>                      successor node. If, for some reason, that block has more than one
>                      predecessor, then we will skip this candidate as there should be a node
>                      that dominates this successor that has a single entry point. The last node
>                      in the dominance vector should also have a single successor. If it does
>                      not, then further investigation of the CFG is necessary to see when/how
>                      this situation occurs.
>
>                      We will need several heuristics to make sure we only
>                      outline in cases where we are confident it will result in a performance
>                      gain. Things such as threshold on when a branch is considered cold, the
>                      minimum number of times the predecessor node has to be executed in order
>                      for an edge to be considered (confidence factor), and the minimum size of
>                      the region to be outlined (can use inlining cost analysis like we currently
>                      do) will require some level of tuning.
>
>                      Similar to the current implementation, we will
>                      attempt to inline the leftover (hot) parts of the code, and if for some
>                      reason we cannot then we discard the modified function and its outlined
>                      code.
>
>                      ### Code changes
>
>                      The current Partial Inlining code first clones the
>                      function of interest and looks for a single set of blocks to outline. It
>                      then creates a function with the set the blocks, and saves the outlined
>                      function and outline callsite information as part of the function cloning
>                      container. In order to outline multiple regions of the function, we will
>                      need to change these containers to keep track of a list of regions to
>                      outline. We will also need to update the cost analysis to take into account
>                      multiple outlined functions.
>
>                      When a ProfileSummary is available, then we should
>                      skip the code that looks for early returns and go into new code that looks
>                      for cold regions to outline. When ProfileSummary is not available, then we
>                      can fall back to the existing code and look for early returns only.
>
>                      ### Tuning
>
>                      - The outlining heuristics will need to determine if
>                      a set of cold blocks is large enough to warrant the overhead of a function
>                      call. We also don't want the inliner to attempt to inline the outlined code
>                      later.
>                      - The threshold for determining whether a block is
>                      cold will also need to be tuned. In the case that profiling information is
>                      not accurate, we will pay the price of the additional call overhead for
>                      executing cold code.
>                      - The confidence factor, which can be viewed as the
>                      minimum number of times the predecessor has to be executed in order for an
>                      edge to be considered cold, should also be taken into account to avoid
>                      outlining code paths we have little information on.
>
>                      Graham Yiu
>                      LLVM Compiler Development
>                      IBM Toronto Software Lab
>                      Office: *(905) 413-4077* <(905)%20413-4077>
>                      C2-407/8200/Markham
>                      Email: *gyiu at ca.ibm.com* <gyiu at ca.ibm.com>
>
>                      _______________________________________________
>                      LLVM Developers mailing list
> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>
> *http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev*
>                      <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=4ST7e3kMd0GTi3w9ByK5Cw&m=rbfPPnRP9weVvtwCT5LyhMrn3TeP6-HaVUUkv-DHQ5I&s=0NPYoALj0vvVlLnq4AKtctnM_tHFxPY6SsX5mv2LMUE&e=>
>
>
>
>
>    _______________________________________________
>    LLVM Developers mailing list
> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>
> *http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev*
>    <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=4ST7e3kMd0GTi3w9ByK5Cw&m=JmHSq8JxRpKOvzgsSuhWaAIzQgeYck1L-m_FSkgn7vw&s=-8UO_5yk7LKsuPAyNXhmaeaGemDfuTFOcbjt3SpjL7E&e=>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170829/06cc8cba/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170829/06cc8cba/attachment-0001.gif>