[llvm-dev] [RFC] Enable Partial Inliner by default

Graham Yiu via llvm-dev llvm-dev at lists.llvm.org
Fri Nov 3 13:00:50 PDT 2017


Hi David,

I think we should support multi-region outlining with and without PGO
information at some point.  However, the scope of the current patch
(D38190) should be with PGO information only, as we haven't come up with a
viable heuristic to outline multiple regions without PGO data.

The current single region outlining (really, the 'tail' region of the
function) seems to have some potential in various workloads, so I think
it's worthwhile to turn it on by default to expose the optimization to
various platforms and get more feedback on modifications/improvements we
can add in the future.

Cheers,

Graham Yiu
LLVM Compiler Development
IBM Toronto Software Lab
Office: (905) 413-4077      C2-707/8200/Markham
Email: gyiu at ca.ibm.com



From:	Xinliang David Li <xinliangli at gmail.com>
To:	Graham Yiu <gyiu at ca.ibm.com>
Cc:	llvm-dev <llvm-dev at lists.llvm.org>, Jun Lim
            <junbuml at codeaurora.org>
Date:	11/03/2017 02:58 PM
Subject:	Re: [RFC] Enable Partial Inliner by default



Hi Graham, thanks for driving this. I assume the multi-region partial
inliner you are working on will eventually replace the current single
region partial-inliner and be turned on even without PGO.   If that is the
plan, is it better to wait until that work is more complete, or the
multi-region support will only be used with profile feedback?

David


On Thu, Nov 2, 2017 at 2:26 PM, Graham Yiu <gyiu at ca.ibm.com> wrote:
  Hello,

  I'd like to propose turning on the partial inliner
  (-enable-partial-inlining) by default.

  We've seen small gains on SPEC2006/2017 runtimes as well as lnt
  compile-times with a 2nd stage bootstrap of LLVM. We also saw positive
  gains on our internal workloads.

  -------------------------------------
  Brief description of Partial Inlining
  -------------------------------------
  A pass in opt that runs after the normal inlining pass. Looks for
  branches to a return block in the entry and immediate successor blocks of
  a function. If found, it outlines the rest of the function using the
  CodeExtractor. It then attempts to inline the leftover entry block (and
  possibly one or more of its successors) to all its callers. This
  effectively peels the early return block(s) into the caller, which could
  be executed without incurring the call overhead of the function just to
  return immediately. Inlining and call overhead cost, as well as branch
  probabilities of the return block(s) are taken into account before
  inlining is done. If inlining is not successful, then the changes are
  discarded.

  eg.

  void foo() {
  bar();
  // rest of the code in foo
  }

  void bar() {
  if (X)
  return;
  // rest of code (to be outlined)
  }

  After Partial Inlining:

  void foo() {
  if (!X)
  bar.outlined();
  // rest of the code in foo
  }

  void bar.outlined() {
  // rest of the code in bar
  }


  Here are the numbers on a Power8 PPCLE running Ubuntu 15.04 in ST-mode

  ----------------------------------------------
  Runtime performance (speed)
  ----------------------------------------------
  Workload Improvement
  -------- -----------
  SPEC2006(C/C++) 0.06% (geomean)
  SPEC2017(C/C++) 0.10% (geomean)
  ----------------------------------------------
  Compile time performance for Bootstrapped LLVM
  ----------------------------------------------
  Workload Improvement
  -------- -----------
  SPEC2006(C/C++) 0.41% (cumulative)
  SPEC2017(C/C++) -0.16% (cumulative)
  lnt 0.61% (geomean)
  ----------------------------------------------
  Compile time performance
  ----------------------------------------------
  Workload Increase
  -------- --------
  SPEC2006(C/C++) 1.31% (cumulative)
  SPEC2017(C/C++) 0.25% (cumulative)
  ----------------------------------------------
  Code size
  ----------------------------------------------
  Workload Increase
  -------- --------
  SPEC2006(C/C++) 3.90% (geomean)
  SPEC2017(C/C++) 1.05% (geomean)

  NOTE1: Code size increase in SPEC2006 was mainly attributed to benchmark
  "astar", which increased by 86%. Removing this outlier, we get a more
  reasonable increase of 0.58%.

  NOTE2: There is a patch up for review on Phabricator to enhance the
  partial inliner with the presence of profiling information (
  https://reviews.llvm.org/D38190).


  Graham Yiu
  LLVM Compiler Development
  IBM Toronto Software Lab
  Office: (905) 413-4077 C2-707/8200/Markham
  Email: gyiu at ca.ibm.com








-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171103/f787d145/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171103/f787d145/attachment.gif>


More information about the llvm-dev mailing list