[PATCHES] A module inliner pass with a greedy call site queue

Tue Aug 5 02:11:24 PDT 2014

Hi Yin,

I don't see performance improvement on cortex-a57 for eon with your patch,
and spec2000/int data is as below, (negative is good)

164.gzip 0.00%
175.vpr -4.55%
176.gcc 0.83%
181.mcf 0.00%
186.crafty 0.00%
197.parser -2.26%
252.eon 1.46%
253.perlbmk 5.24%
254.gap 0.88%
255.vortex -0.52%
256.bzip2 0.47%
300.twolf 1.87%

Thanks,
-Jiangning

2014-08-05 10:52 GMT+08:00 Jiangning Liu <liujiangning1 at gmail.com>:

> Yin,
>
> I got the following "make check-all" failure.
>
> /home/jialiu01/llvm/llvm/tools/clang/test/Driver/greedy-inliner.c:8:11:
> error: expected string not found in input
> // CHECK: Greedy Inliner
>           ^
> <stdin>:1:1: note: scanning from here
> clang (LLVM option parsing): for the -print-after option: Cannot find
> option named 'greedy-inliner'!
> ^
>
> Can you confirm that is an issue?
>
> And for performance, I haven't got the data on Cortex-A57 yet, and I will
> let you know as soon as I get the result. For Cortex-A53, I never try it
> before.
>
> Thanks,
> -Jiangning
>
>
>
> 2014-08-05 6:06 GMT+08:00 Yin Ma <yinma at codeaurora.org>:
>
> Hi All,
>>
>>
>>
>> Thank Jiangning for a comprehensive testing for Greedy inliner. I am
>> aware of Chandler's discussion about rewriting the pass manager in order to
>> overcome the limitation of current inliner and the intension toward the
>> perfect solution.
>>
>>
>>
>> But we had to provide an inliner solution to address some LLVM
>> performance degradation compared to GCC. That is how the greedy inliner was
>> born. This inliner is a module pass, it does not have the SCC<->function
>> analysis problem. Note that the Greedy inliner is a flexible solution, it
>> can be set up to be either a bottom up, top down or other custom order
>> (that is the purpose of using a queue with sorted weights).
>>
>>
>>
>> Regarding code size, for our internal very large C++ code base, the
>> Greedy inliner did better job compared with SCC inliner at -Os. It was able
>> to inline more functions than the SCC inliner without increasing code size.
>> In one instance the generated file by either inliner approaches was quite
>> similar size. However, looking at the number of entries in the symbol
>> table, the Greedy inliner version had 540880 entries, while the SCC inliner
>> version had 619639 entries. This was achieved by setting weights to favor
>> top down order. Chandler, if you have any large code base examples in mind,
>> I would like to try.
>>
>>
>>
>> Regarding performance, the Greedy inliner has also shown better
>> performance than the SCC inliner. I already reported the gains for
>> SPEC2000 (eon 16%, mesa 5%) without any degradation of other tests.
>> Jiangning also verified it independently. This was achieved by setting
>> weights to favor call sites in loops.
>>
>>
>>
>> For virtual dispatch, we didn't see any C++ virtual dispatch problem
>> exposed when evaluating the Greedy inliner because greedy inliner reused
>> the SCC inliner to do the local decision. If anyone has a test case for
>> this or program in mind, I can try to run it and report the findings.
>>
>>
>>
>> I like the suggestion from Hal to have a "more in-depth discussion on the
>> goals of inlining, and how we do, or plan to, achieve them." Since now we
>> have two concrete solutions for inliners, how about we have BOF discussion
>> at the LLVM dev conference? I can send a proposal.
>>
>>
>>
>> What do you guys think?
>>
>>
>>
>> Here are some details on the scenarios we considered when tuning the
>> greedy inliner and other possible future scenarios. The first one is A <- B
>> <- C case mentioned, B is in a loop of A.  B to A should be higher priority
>> to be considered before C to B
>>
>> A() {
>>
>> For(..)  { Call B() }
>>
>> }
>>
>> B() {
>>
>>    call C()
>>
>> }
>>
>>
>>
>> The second is A called B many times, one bar is in a loop that need to be
>> inlined.  B() instead loops should have higher priority to be considered
>> than other B in A(). other B may not be benefitical to be inlined for code
>> size tuning.
>>
>> A() {
>>
>> Call B()
>>
>> Call B()
>>
>> Call B()
>>
>> For(...) { Call B() }
>>
>> Call B()
>>
>> }
>>
>>
>>
>> The third is a series of continue calls, in the architecture we targeted
>> on, we don't want to inline them. Inliner should have a global sense to do
>> the decision.
>>
>> A() {
>>
>> If (...) {
>>
>>       Call B();
>>
>>       Call B();
>>
>>       Call B();
>>
>>       Call B();
>>
>> }else if {
>>
>>       Call B();
>>
>>       Call B();
>>
>>       Call B();
>>
>>       Call B();
>>
>> }else ...
>>
>> ...
>>
>> }
>>
>>
>>
>> The next one is a future scenario that supports profile based decision. I
>> considered this case but not implemented in the current version of greey
>> inliner. Block frequency info can be used in computation to guide order and
>> decision.
>>
>>
>>
>> The key to take into account top-down/bottom up differences and the
>> scenarios described above is to have an inliner framework that has the
>> concept of a global queue with  sorted weights.  It is a very flexible
>> framework.  Any future LLVM inliner solution we decide on should support
>> this type of feature.
>>
>>
>>
>> Yin
>>
>>
>>
>> *From:* llvm-commits-bounces at cs.uiuc.edu [mailto:
>> llvm-commits-bounces at cs.uiuc.edu] *On Behalf Of *Chandler Carruth
>> *Sent:* Sunday, August 03, 2014 11:51 PM
>> *To:* Jiangning Liu
>> *Cc:* Jiangning Liu; Commit Messages and Patches for LLVM
>>
>> *Subject:* Re: [PATCHES] A module inliner pass with a greedy call site
>> queue
>>
>>
>>
>> Just a brief note...
>>
>>
>>
>> On Sun, Aug 3, 2014 at 11:42 PM, Jiangning Liu <liujiangning1 at gmail.com>
>> wrote:
>>
>> 1. I measured code size impact by Yin's patch, overall I don't see code
>> size regression.
>>
>>
>>
>> 1) For the following cpp program in SPEC, we have the following data.
>>
>>
>>
>> -O2 result:
>>
>>
>>
>> spec old_text_section old_data_section new_text_section new_text_section
>> text_percentage data_percentage
>>
>> 252.eon 302848 2232 297301 2312 -1.83% 3.58%
>>
>> 450.soplex 366474 1536 389164 1656 6.19% 7.81%
>>
>> 453.povray 898032 12632 850444 12632 -5.30% 0.00%
>>
>> 471.omnetpp 685516 9136 693349 9128 1.14% -0.09%
>>
>> 473.astar 38999 860 41011 860 5.16% 0.00%
>>
>> 483.xalancbmk 4282478 139376 4414286 139376 3.08% 0.00%
>>
>> sum 6574347 165772 6685555 165964 1.69% 0.12%
>>
>>
>> SPEC is highly misleading w.r.t. code size. Also, there are several
>> regressions in code size in addition to improvements. It would be useful to
>> get measurements from larger code bases.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140805/aa2fbf78/attachment.html>