[PATCHES] A module inliner pass with a greedy call site queue

Jiangning Liu liujiangning1 at gmail.com
Thu Aug 7 00:14:00 PDT 2014


Hi Yin,

Sorry that previously I didn't notice command line option "*-mllvm
-inline-perf-mode=true*", because your test case doesn't show that. So now
I measured performance on cortex-a57 again with command line option "*-mllvm
-greedy-inliner=true -mllvm -inline-perf-mode=true*".

spec2000 greedy_inliner_perf Threshold_1000_perf greedy_inliner_code_size
threshold_1000_code_size
164.gzip 0.00% -0.78% 6.25% 14.55%
175.vpr -4.09% -3.14% 1.84% 14.49%
176.gcc 0.83% 0.83% 0.08% 33.16%
181.mcf 0.00% 0.00% 3.58% 19.58%
186.crafty -0.93% 1.85% -0.94% 14.38%
197.parser -1.61% -2.24% -0.04% 1.48%
252.eon -7.30% -6.52% 2.64% 6.42%
253.perlbmk -2.38% -3.76% -1.75% 2.22%
254.gap 0.00% -1.72% 2.93% 18.44%
255.vortex -1.04% -4.19% 3.40% 47.07%
256.bzip2 1.40% -1.83% 2.23% 10.11%
300.twolf 1.87% -0.36% -1.87% 23.48%
177.mesa -3.36% -2.52% 0.48% 35.04%
179.art 1.37% 0.00% 0.45% 9.26%
183.equake -4.35% -5.80% 1.67% 23.23%
188.ammp 0.35% 1.75% 0.07% 6.69%

So now this performance result looks quite promising!

For xxx_perf, the negative number means running time is reduced and
performance is better.
For xxx_codesize, the number is only for .text section.

>From this result we can see,

1) The greedy inliner obtained the similar performance improvement as
setting threshold to be 1000 with the original inliner.
2) But comparing with the significant code size bloat, the code size change
of greedy inliner is quite limited on average.

Thanks,
-Jiangning


2014-08-05 17:11 GMT+08:00 Jiangning Liu <liujiangning1 at gmail.com>:

> Hi Yin,
>
> I don't see performance improvement on cortex-a57 for eon with your patch,
> and spec2000/int data is as below, (negative is good)
>
> 164.gzip 0.00%
> 175.vpr -4.55%
> 176.gcc 0.83%
> 181.mcf 0.00%
> 186.crafty 0.00%
> 197.parser -2.26%
> 252.eon 1.46%
> 253.perlbmk 5.24%
> 254.gap 0.88%
> 255.vortex -0.52%
> 256.bzip2 0.47%
> 300.twolf 1.87%
>
> Thanks,
> -Jiangning
>
>
>
> 2014-08-05 10:52 GMT+08:00 Jiangning Liu <liujiangning1 at gmail.com>:
>
> Yin,
>>
>> I got the following "make check-all" failure.
>>
>> /home/jialiu01/llvm/llvm/tools/clang/test/Driver/greedy-inliner.c:8:11:
>> error: expected string not found in input
>> // CHECK: Greedy Inliner
>>           ^
>> <stdin>:1:1: note: scanning from here
>> clang (LLVM option parsing): for the -print-after option: Cannot find
>> option named 'greedy-inliner'!
>> ^
>>
>> Can you confirm that is an issue?
>>
>> And for performance, I haven't got the data on Cortex-A57 yet, and I will
>> let you know as soon as I get the result. For Cortex-A53, I never try it
>> before.
>>
>> Thanks,
>> -Jiangning
>>
>>
>>
>> 2014-08-05 6:06 GMT+08:00 Yin Ma <yinma at codeaurora.org>:
>>
>> Hi All,
>>>
>>>
>>>
>>> Thank Jiangning for a comprehensive testing for Greedy inliner. I am
>>> aware of Chandler's discussion about rewriting the pass manager in order to
>>> overcome the limitation of current inliner and the intension toward the
>>> perfect solution.
>>>
>>>
>>>
>>> But we had to provide an inliner solution to address some LLVM
>>> performance degradation compared to GCC. That is how the greedy inliner was
>>> born. This inliner is a module pass, it does not have the SCC<->function
>>> analysis problem. Note that the Greedy inliner is a flexible solution, it
>>> can be set up to be either a bottom up, top down or other custom order
>>> (that is the purpose of using a queue with sorted weights).
>>>
>>>
>>>
>>> Regarding code size, for our internal very large C++ code base, the
>>> Greedy inliner did better job compared with SCC inliner at -Os. It was able
>>> to inline more functions than the SCC inliner without increasing code size.
>>> In one instance the generated file by either inliner approaches was quite
>>> similar size. However, looking at the number of entries in the symbol
>>> table, the Greedy inliner version had 540880 entries, while the SCC inliner
>>> version had 619639 entries. This was achieved by setting weights to favor
>>> top down order. Chandler, if you have any large code base examples in mind,
>>> I would like to try.
>>>
>>>
>>>
>>> Regarding performance, the Greedy inliner has also shown better
>>> performance than the SCC inliner. I already reported the gains for
>>> SPEC2000 (eon 16%, mesa 5%) without any degradation of other tests.
>>> Jiangning also verified it independently. This was achieved by setting
>>> weights to favor call sites in loops.
>>>
>>>
>>>
>>> For virtual dispatch, we didn't see any C++ virtual dispatch problem
>>> exposed when evaluating the Greedy inliner because greedy inliner reused
>>> the SCC inliner to do the local decision. If anyone has a test case for
>>> this or program in mind, I can try to run it and report the findings.
>>>
>>>
>>>
>>> I like the suggestion from Hal to have a "more in-depth discussion on
>>> the goals of inlining, and how we do, or plan to, achieve them." Since now
>>> we have two concrete solutions for inliners, how about we have BOF
>>> discussion at the LLVM dev conference? I can send a proposal.
>>>
>>>
>>>
>>> What do you guys think?
>>>
>>>
>>>
>>> Here are some details on the scenarios we considered when tuning the
>>> greedy inliner and other possible future scenarios. The first one is A <- B
>>> <- C case mentioned, B is in a loop of A.  B to A should be higher priority
>>> to be considered before C to B
>>>
>>> A() {
>>>
>>> For(..)  { Call B() }
>>>
>>> }
>>>
>>> B() {
>>>
>>>    call C()
>>>
>>> }
>>>
>>>
>>>
>>> The second is A called B many times, one bar is in a loop that need to
>>> be inlined.  B() instead loops should have higher priority to be considered
>>> than other B in A(). other B may not be benefitical to be inlined for code
>>> size tuning.
>>>
>>> A() {
>>>
>>> Call B()
>>>
>>> Call B()
>>>
>>> Call B()
>>>
>>> For(...) { Call B() }
>>>
>>> Call B()
>>>
>>> }
>>>
>>>
>>>
>>> The third is a series of continue calls, in the architecture we targeted
>>> on, we don't want to inline them. Inliner should have a global sense to do
>>> the decision.
>>>
>>> A() {
>>>
>>> If (...) {
>>>
>>>       Call B();
>>>
>>>       Call B();
>>>
>>>       Call B();
>>>
>>>       Call B();
>>>
>>> }else if {
>>>
>>>       Call B();
>>>
>>>       Call B();
>>>
>>>       Call B();
>>>
>>>       Call B();
>>>
>>> }else ...
>>>
>>> ...
>>>
>>> }
>>>
>>>
>>>
>>> The next one is a future scenario that supports profile based decision.
>>> I considered this case but not implemented in the current version of greey
>>> inliner. Block frequency info can be used in computation to guide order and
>>> decision.
>>>
>>>
>>>
>>> The key to take into account top-down/bottom up differences and the
>>> scenarios described above is to have an inliner framework that has the
>>> concept of a global queue with  sorted weights.  It is a very flexible
>>> framework.  Any future LLVM inliner solution we decide on should support
>>> this type of feature.
>>>
>>>
>>>
>>> Yin
>>>
>>>
>>>
>>> *From:* llvm-commits-bounces at cs.uiuc.edu [mailto:
>>> llvm-commits-bounces at cs.uiuc.edu] *On Behalf Of *Chandler Carruth
>>> *Sent:* Sunday, August 03, 2014 11:51 PM
>>> *To:* Jiangning Liu
>>> *Cc:* Jiangning Liu; Commit Messages and Patches for LLVM
>>>
>>> *Subject:* Re: [PATCHES] A module inliner pass with a greedy call site
>>> queue
>>>
>>>
>>>
>>> Just a brief note...
>>>
>>>
>>>
>>> On Sun, Aug 3, 2014 at 11:42 PM, Jiangning Liu <liujiangning1 at gmail.com>
>>> wrote:
>>>
>>> 1. I measured code size impact by Yin's patch, overall I don't see code
>>> size regression.
>>>
>>>
>>>
>>> 1) For the following cpp program in SPEC, we have the following data.
>>>
>>>
>>>
>>> -O2 result:
>>>
>>>
>>>
>>> spec old_text_section old_data_section new_text_section new_text_section
>>> text_percentage data_percentage
>>>
>>> 252.eon 302848 2232 297301 2312 -1.83% 3.58%
>>>
>>> 450.soplex 366474 1536 389164 1656 6.19% 7.81%
>>>
>>> 453.povray 898032 12632 850444 12632 -5.30% 0.00%
>>>
>>> 471.omnetpp 685516 9136 693349 9128 1.14% -0.09%
>>>
>>> 473.astar 38999 860 41011 860 5.16% 0.00%
>>>
>>> 483.xalancbmk 4282478 139376 4414286 139376 3.08% 0.00%
>>>
>>> sum 6574347 165772 6685555 165964 1.69% 0.12%
>>>
>>>
>>> SPEC is highly misleading w.r.t. code size. Also, there are several
>>> regressions in code size in addition to improvements. It would be useful to
>>> get measurements from larger code bases.
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140807/0596360e/attachment.html>


More information about the llvm-commits mailing list