[PATCHES] A module inliner pass with a greedy call site queue

Mon Aug 4 15:06:13 PDT 2014

Hi All,

Thank Jiangning for a comprehensive testing for Greedy inliner. I am aware of Chandler's discussion about rewriting the pass manager in order to overcome the limitation of current inliner and the intension toward the perfect solution. 

But we had to provide an inliner solution to address some LLVM performance degradation compared to GCC. That is how the greedy inliner was born. This inliner is a module pass, it does not have the SCC<->function analysis problem. Note that the Greedy inliner is a flexible solution, it can be set up to be either a bottom up, top down or other custom order (that is the purpose of using a queue with sorted weights).

Regarding code size, for our internal very large C++ code base, the Greedy inliner did better job compared with SCC inliner at -Os. It was able to inline more functions than the SCC inliner without increasing code size. In one instance the generated file by either inliner approaches was quite similar size. However, looking at the number of entries in the symbol table, the Greedy inliner version had 540880 entries, while the SCC inliner version had 619639 entries. This was achieved by setting weights to favor top down order. Chandler, if you have any large code base examples in mind, I would like to try. 

Regarding performance, the Greedy inliner has also shown better performance than the SCC inliner. I already reported the gains for  SPEC2000 (eon 16%, mesa 5%) without any degradation of other tests. Jiangning also verified it independently. This was achieved by setting weights to favor call sites in loops.

For virtual dispatch, we didn't see any C++ virtual dispatch problem exposed when evaluating the Greedy inliner because greedy inliner reused the SCC inliner to do the local decision. If anyone has a test case for this or program in mind, I can try to run it and report the findings.

I like the suggestion from Hal to have a "more in-depth discussion on the goals of inlining, and how we do, or plan to, achieve them." Since now we have two concrete solutions for inliners, how about we have BOF discussion at the LLVM dev conference? I can send a proposal.

What do you guys think?

Here are some details on the scenarios we considered when tuning the greedy inliner and other possible future scenarios. The first one is A <- B <- C case mentioned, B is in a loop of A.  B to A should be higher priority to be considered before C to B 

A() {

For(..)  { Call B() }

}

B() {

   call C()

}

The second is A called B many times, one bar is in a loop that need to be inlined.  B() instead loops should have higher priority to be considered than other B in A(). other B may not be benefitical to be inlined for code size tuning.

A() {

Call B()

Call B()

Call B()

For(...) { Call B() }

Call B()

}

The third is a series of continue calls, in the architecture we targeted on, we don't want to inline them. Inliner should have a global sense to do the decision.

A() {

If (...) {

      Call B(); 

      Call B(); 

      Call B();

      Call B();

}else if {

      Call B();

      Call B();

      Call B();

      Call B();

}else ...

...

}

The next one is a future scenario that supports profile based decision. I considered this case but not implemented in the current version of greey inliner. Block frequency info can be used in computation to guide order and decision.

The key to take into account top-down/bottom up differences and the scenarios described above is to have an inliner framework that has the concept of a global queue with  sorted weights.  It is a very flexible framework.  Any future LLVM inliner solution we decide on should support this type of feature.

Yin

From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Chandler Carruth
Sent: Sunday, August 03, 2014 11:51 PM
To: Jiangning Liu
Cc: Jiangning Liu; Commit Messages and Patches for LLVM
Subject: Re: [PATCHES] A module inliner pass with a greedy call site queue

Just a brief note...

On Sun, Aug 3, 2014 at 11:42 PM, Jiangning Liu <liujiangning1 at gmail.com> wrote:

1. I measured code size impact by Yin's patch, overall I don't see code size regression.

1) For the following cpp program in SPEC, we have the following data.

-O2 result:

spec old_text_section old_data_section new_text_section new_text_section text_percentage data_percentage

252.eon 302848 2232 297301 2312 -1.83% 3.58%

450.soplex 366474 1536 389164 1656 6.19% 7.81%

453.povray 898032 12632 850444 12632 -5.30% 0.00%

471.omnetpp 685516 9136 693349 9128 1.14% -0.09%

473.astar 38999 860 41011 860 5.16% 0.00%

483.xalancbmk 4282478 139376 4414286 139376 3.08% 0.00%

sum 6574347 165772 6685555 165964 1.69% 0.12%

SPEC is highly misleading w.r.t. code size. Also, there are several regressions in code size in addition to improvements. It would be useful to get measurements from larger code bases.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140804/39abc534/attachment.html>