[llvm-dev] [RFC] Enhance Partial Inliner by using a general outlining scheme for cold blocks

Tue Aug 15 13:14:51 PDT 2017

Hi Jessica,

Thanks for the feedback.

I believe the existing partial inliner pass does use some common utilities
like the code extractor to do outlining.  Is that what you're referring to?
I don't recall seeing any other passes that has outlining other than the
machine outliner, but I may have missed something.

I briefly looked at River's RFC and it seems he's mainly utilizing
outlining as a tool to reduce code size while maintaining code performance.
Our proposal is to improve on the existing outlining scheme of the partial
inliner to give us more opportunities to inline hot code.  We will only do
this type of general outlining with the presence of profiling or user data
(such as builtin_expect).  With that said, I am interested in how River
plans to handle live ranges and heuristics he will use with profiling data
available.

I haven't looked at the machine outliner yet, but my understanding is it
also prioritizes code size reduction over performance gains.  Is my
assumption correct?  Do you think there's some tuning we can do in the
machine outliner to tailor for performance improvements?

Graham Yiu
LLVM Compiler Development
IBM Toronto Software Lab
Office: (905) 413-4077      C2-407/8200/Markham
Email: gyiu at ca.ibm.com

From:	Jessica Paquette <jpaquette at apple.com>
To:	Graham Yiu <gyiu at ca.ibm.com>
Cc:	llvm-dev at lists.llvm.org
Date:	08/15/2017 02:45 PM
Subject:	Re: [llvm-dev] [RFC] Enhance Partial Inliner by using a general
            outlining scheme for cold blocks
Sent by:	jpaquette at apple.com

Hi Graham,

Have you looked at using the existing outliner framework to implement
partial inlining? In the past, people have combined outlining + inlining to
get partial inlining “for free” (see "Function outlining and partial
inlining” by Zhao and Amaral).

There is the MachineOutliner. River Riddle has been working on an IR-level
outliner as well. The MachineOutliner doesn’t currently consider the heat
of a block, but might be worth experimenting with. It might be good to
experiment with River’s work and the existing outlining pass to see if this
functionality can effectively fall out of what we have already.

Here’s the recent IR-level outlining thread for reference wrt River’s work:
http://lists.llvm.org/pipermail/llvm-dev/2017-July/115666.html

- Jessica

      On Aug 15, 2017, at 11:22 AM, Graham Yiu via llvm-dev <
      llvm-dev at lists.llvm.org> wrote:

      Hello,

      My team and I are looking to do some enhancements in the partial
      inliner in opt. Would appreciate any feedback that folks might have.

      # Partial Inlining in LLVM opt

      ## Summary

      ### Background

      Currently, the partial inliner searches the first few blocks of the
      callee and looks for a branch to the return block (ie. early return).
      If found, it attempts to outline the rest of the slow (or heavy) code
      so the inliner will be able to inline the fast (or light) code. If no
      early returns are found, the partial inliner will give up. As far as
      I can tell, BlockFrequency and BranchProbability information is only
      used when attempting to inline the early return code, and not used to
      determine whether to outline the slow code.

      ### Proposed changes

      In addition to looking for early returns, we should utilize profile
      information to outline blocks that are considered cold. If we can
      sufficiently reduce the size of the original function via this type
      of outlining, inlining should be able to inline the rest of the hot
      code.

      ## Details

      With the presence of profile information, we have a view of what code
      is infrequently executed and make better decisions on what to
      outline. Early return blocks that are infrequently executed should
      still be included as candidates for outlining, but will be treated
      just like any other cold blocks. Without profiling information,
      however, we should remain conservative and only partial inline in the
      presence of an early return in the first few blocks of a function
      (ie. peel the early return out of the function).

      To find cold regions to outline, we will traverse the CFG to find
      edges deemed 'cold' and look at the blocks dominated by the successor
      node. If, for some reason, that block has more than one predecessor,
      then we will skip this candidate as there should be a node that
      dominates this successor that has a single entry point. The last node
      in the dominance vector should also have a single successor. If it
      does not, then further investigation of the CFG is necessary to see
      when/how this situation occurs.

      We will need several heuristics to make sure we only outline in cases
      where we are confident it will result in a performance gain. Things
      such as threshold on when a branch is considered cold, the minimum
      number of times the predecessor node has to be executed in order for
      an edge to be considered (confidence factor), and the minimum size of
      the region to be outlined (can use inlining cost analysis like we
      currently do) will require some level of tuning.

      Similar to the current implementation, we will attempt to inline the
      leftover (hot) parts of the code, and if for some reason we cannot
      then we discard the modified function and its outlined code.

      ### Code changes

      The current Partial Inlining code first clones the function of
      interest and looks for a single set of blocks to outline. It then
      creates a function with the set the blocks, and saves the outlined
      function and outline callsite information as part of the function
      cloning container. In order to outline multiple regions of the
      function, we will need to change these containers to keep track of a
      list of regions to outline. We will also need to update the cost
      analysis to take into account multiple outlined functions.

      When a ProfileSummary is available, then we should skip the code that
      looks for early returns and go into new code that looks for cold
      regions to outline. When ProfileSummary is not available, then we can
      fall back to the existing code and look for early returns only.

      ### Tuning

      - The outlining heuristics will need to determine if a set of cold
      blocks is large enough to warrant the overhead of a function call. We
      also don't want the inliner to attempt to inline the outlined code
      later.
      - The threshold for determining whether a block is cold will also
      need to be tuned. In the case that profiling information is not
      accurate, we will pay the price of the additional call overhead for
      executing cold code.
      - The confidence factor, which can be viewed as the minimum number of
      times the predecessor has to be executed in order for an edge to be
      considered cold, should also be taken into account to avoid outlining
      code paths we have little information on.

      Graham Yiu
      LLVM Compiler Development
      IBM Toronto Software Lab
      Office: (905) 413-4077 C2-407/8200/Markham
      Email: gyiu at ca.ibm.com

      _______________________________________________
      LLVM Developers mailing list
      llvm-dev at lists.llvm.org
      http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170815/93e2a93b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170815/93e2a93b/attachment.gif>