[PATCHES] A module inliner pass with a greedy call site queue

Mon Aug 4 19:52:33 PDT 2014

Yin,

I got the following "make check-all" failure.

/home/jialiu01/llvm/llvm/tools/clang/test/Driver/greedy-inliner.c:8:11:
error: expected string not found in input
// CHECK: Greedy Inliner
          ^
<stdin>:1:1: note: scanning from here
clang (LLVM option parsing): for the -print-after option: Cannot find
option named 'greedy-inliner'!
^

Can you confirm that is an issue?

And for performance, I haven't got the data on Cortex-A57 yet, and I will
let you know as soon as I get the result. For Cortex-A53, I never try it
before.

Thanks,
-Jiangning

2014-08-05 6:06 GMT+08:00 Yin Ma <yinma at codeaurora.org>:

> Hi All,
>
>
>
> Thank Jiangning for a comprehensive testing for Greedy inliner. I am aware
> of Chandler's discussion about rewriting the pass manager in order to
> overcome the limitation of current inliner and the intension toward the
> perfect solution.
>
>
>
> But we had to provide an inliner solution to address some LLVM performance
> degradation compared to GCC. That is how the greedy inliner was born. This
> inliner is a module pass, it does not have the SCC<->function analysis
> problem. Note that the Greedy inliner is a flexible solution, it can be set
> up to be either a bottom up, top down or other custom order (that is the
> purpose of using a queue with sorted weights).
>
>
>
> Regarding code size, for our internal very large C++ code base, the Greedy
> inliner did better job compared with SCC inliner at -Os. It was able to
> inline more functions than the SCC inliner without increasing code size. In
> one instance the generated file by either inliner approaches was quite
> similar size. However, looking at the number of entries in the symbol
> table, the Greedy inliner version had 540880 entries, while the SCC inliner
> version had 619639 entries. This was achieved by setting weights to favor
> top down order. Chandler, if you have any large code base examples in mind,
> I would like to try.
>
>
>
> Regarding performance, the Greedy inliner has also shown better
> performance than the SCC inliner. I already reported the gains for
> SPEC2000 (eon 16%, mesa 5%) without any degradation of other tests.
> Jiangning also verified it independently. This was achieved by setting
> weights to favor call sites in loops.
>
>
>
> For virtual dispatch, we didn't see any C++ virtual dispatch problem
> exposed when evaluating the Greedy inliner because greedy inliner reused
> the SCC inliner to do the local decision. If anyone has a test case for
> this or program in mind, I can try to run it and report the findings.
>
>
>
> I like the suggestion from Hal to have a "more in-depth discussion on the
> goals of inlining, and how we do, or plan to, achieve them." Since now we
> have two concrete solutions for inliners, how about we have BOF discussion
> at the LLVM dev conference? I can send a proposal.
>
>
>
> What do you guys think?
>
>
>
> Here are some details on the scenarios we considered when tuning the
> greedy inliner and other possible future scenarios. The first one is A <- B
> <- C case mentioned, B is in a loop of A.  B to A should be higher priority
> to be considered before C to B
>
> A() {
>
> For(..)  { Call B() }
>
> }
>
> B() {
>
>    call C()
>
> }
>
>
>
> The second is A called B many times, one bar is in a loop that need to be
> inlined.  B() instead loops should have higher priority to be considered
> than other B in A(). other B may not be benefitical to be inlined for code
> size tuning.
>
> A() {
>
> Call B()
>
> Call B()
>
> Call B()
>
> For(...) { Call B() }
>
> Call B()
>
> }
>
>
>
> The third is a series of continue calls, in the architecture we targeted
> on, we don't want to inline them. Inliner should have a global sense to do
> the decision.
>
> A() {
>
> If (...) {
>
>       Call B();
>
>       Call B();
>
>       Call B();
>
>       Call B();
>
> }else if {
>
>       Call B();
>
>       Call B();
>
>       Call B();
>
>       Call B();
>
> }else ...
>
> ...
>
> }
>
>
>
> The next one is a future scenario that supports profile based decision. I
> considered this case but not implemented in the current version of greey
> inliner. Block frequency info can be used in computation to guide order and
> decision.
>
>
>
> The key to take into account top-down/bottom up differences and the
> scenarios described above is to have an inliner framework that has the
> concept of a global queue with  sorted weights.  It is a very flexible
> framework.  Any future LLVM inliner solution we decide on should support
> this type of feature.
>
>
>
> Yin
>
>
>
> *From:* llvm-commits-bounces at cs.uiuc.edu [mailto:
> llvm-commits-bounces at cs.uiuc.edu] *On Behalf Of *Chandler Carruth
> *Sent:* Sunday, August 03, 2014 11:51 PM
> *To:* Jiangning Liu
> *Cc:* Jiangning Liu; Commit Messages and Patches for LLVM
>
> *Subject:* Re: [PATCHES] A module inliner pass with a greedy call site
> queue
>
>
>
> Just a brief note...
>
>
>
> On Sun, Aug 3, 2014 at 11:42 PM, Jiangning Liu <liujiangning1 at gmail.com>
> wrote:
>
> 1. I measured code size impact by Yin's patch, overall I don't see code
> size regression.
>
>
>
> 1) For the following cpp program in SPEC, we have the following data.
>
>
>
> -O2 result:
>
>
>
> spec old_text_section old_data_section new_text_section new_text_section
> text_percentage data_percentage
>
> 252.eon 302848 2232 297301 2312 -1.83% 3.58%
>
> 450.soplex 366474 1536 389164 1656 6.19% 7.81%
>
> 453.povray 898032 12632 850444 12632 -5.30% 0.00%
>
> 471.omnetpp 685516 9136 693349 9128 1.14% -0.09%
>
> 473.astar 38999 860 41011 860 5.16% 0.00%
>
> 483.xalancbmk 4282478 139376 4414286 139376 3.08% 0.00%
>
> sum 6574347 165772 6685555 165964 1.69% 0.12%
>
>
> SPEC is highly misleading w.r.t. code size. Also, there are several
> regressions in code size in addition to improvements. It would be useful to
> get measurements from larger code bases.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140805/2aff7fd8/attachment.html>