[PATCHES] A module inliner pass with a greedy call site queue

Gerolf Hoflehner ghoflehner at apple.com
Fri Aug 8 15:12:16 PDT 2014


Hi,

I’m not an inlining expert. I have no prejudice or preference about top-down/bottom-up etc., and tend to favor flexibility.

I would love to see more data for this hard heuristic problem:

-what about compile-time?
-did you get a chance to look into vpr? I’m curious about the specific explanation for the gain. Is this for O3 LTO PGO, O3 LTO or just O3?
-could you share data on SPEC2006 for the ref input set? 2006 has a much larger code footprint than 2000 and should reward us with more insight.

Thanks
Gerolf




On Aug 7, 2014, at 12:14 AM, Jiangning Liu <liujiangning1 at gmail.com> wrote:

> Hi Yin,
> 
> Sorry that previously I didn't notice command line option "-mllvm -inline-perf-mode=true", because your test case doesn't show that. So now I measured performance on cortex-a57 again with command line option "-mllvm -greedy-inliner=true -mllvm -inline-perf-mode=true".
> 
> spec2000	greedy_inliner_perf	Threshold_1000_perf	greedy_inliner_code_size	threshold_1000_code_size
> 164.gzip	0.00%	-0.78%	6.25%	14.55%
> 175.vpr	-4.09%	-3.14%	1.84%	14.49%
> 176.gcc	0.83%	0.83%	0.08%	33.16%
> 181.mcf	0.00%	0.00%	3.58%	19.58%
> 186.crafty	-0.93%	1.85%	-0.94%	14.38%
> 197.parser	-1.61%	-2.24%	-0.04%	1.48%
> 252.eon	-7.30%	-6.52%	2.64%	6.42%
> 253.perlbmk	-2.38%	-3.76%	-1.75%	2.22%
> 254.gap	0.00%	-1.72%	2.93%	18.44%
> 255.vortex	-1.04%	-4.19%	3.40%	47.07%
> 256.bzip2	1.40%	-1.83%	2.23%	10.11%
> 300.twolf	1.87%	-0.36%	-1.87%	23.48%
> 177.mesa	-3.36%	-2.52%	0.48%	35.04%
> 179.art	1.37%	0.00%	0.45%	9.26%
> 183.equake	-4.35%	-5.80%	1.67%	23.23%
> 188.ammp	0.35%	1.75%	0.07%	6.69%
> 
> So now this performance result looks quite promising!
> 
> For xxx_perf, the negative number means running time is reduced and performance is better.
> For xxx_codesize, the number is only for .text section.
> 
> From this result we can see,
> 
> 1) The greedy inliner obtained the similar performance improvement as setting threshold to be 1000 with the original inliner.
> 2) But comparing with the significant code size bloat, the code size change of greedy inliner is quite limited on average.
> 
> Thanks,
> -Jiangning
> 
> 
> 2014-08-05 17:11 GMT+08:00 Jiangning Liu <liujiangning1 at gmail.com>:
> Hi Yin,
> 
> I don't see performance improvement on cortex-a57 for eon with your patch, and spec2000/int data is as below, (negative is good)
> 
> 164.gzip	0.00%
> 175.vpr	-4.55%
> 176.gcc	0.83%
> 181.mcf	0.00%
> 186.crafty	0.00%
> 197.parser	-2.26%
> 252.eon	1.46%
> 253.perlbmk	5.24%
> 254.gap	0.88%
> 255.vortex	-0.52%
> 256.bzip2	0.47%
> 300.twolf	1.87%
> 
> Thanks,
> -Jiangning
> 
> 
> 
> 2014-08-05 10:52 GMT+08:00 Jiangning Liu <liujiangning1 at gmail.com>:
> 
> Yin,
> 
> I got the following "make check-all" failure.
> 
> /home/jialiu01/llvm/llvm/tools/clang/test/Driver/greedy-inliner.c:8:11: error: expected string not found in input
> // CHECK: Greedy Inliner
>           ^
> <stdin>:1:1: note: scanning from here
> clang (LLVM option parsing): for the -print-after option: Cannot find option named 'greedy-inliner'!
> ^
> 
> Can you confirm that is an issue?
> 
> And for performance, I haven't got the data on Cortex-A57 yet, and I will let you know as soon as I get the result. For Cortex-A53, I never try it before.
> 
> Thanks,
> -Jiangning
> 
> 
> 
> 2014-08-05 6:06 GMT+08:00 Yin Ma <yinma at codeaurora.org>:
> 
> Hi All,
> 
>  
> 
> Thank Jiangning for a comprehensive testing for Greedy inliner. I am aware of Chandler's discussion about rewriting the pass manager in order to overcome the limitation of current inliner and the intension toward the perfect solution.
> 
>  
> 
> But we had to provide an inliner solution to address some LLVM performance degradation compared to GCC. That is how the greedy inliner was born. This inliner is a module pass, it does not have the SCC<->function analysis problem. Note that the Greedy inliner is a flexible solution, it can be set up to be either a bottom up, top down or other custom order (that is the purpose of using a queue with sorted weights).
> 
>  
> 
> Regarding code size, for our internal very large C++ code base, the Greedy inliner did better job compared with SCC inliner at -Os. It was able to inline more functions than the SCC inliner without increasing code size. In one instance the generated file by either inliner approaches was quite similar size. However, looking at the number of entries in the symbol table, the Greedy inliner version had 540880 entries, while the SCC inliner version had 619639 entries. This was achieved by setting weights to favor top down order. Chandler, if you have any large code base examples in mind, I would like to try.
> 
>  
> 
> Regarding performance, the Greedy inliner has also shown better performance than the SCC inliner. I already reported the gains for  SPEC2000 (eon 16%, mesa 5%) without any degradation of other tests. Jiangning also verified it independently. This was achieved by setting weights to favor call sites in loops.
> 
>  
> 
> For virtual dispatch, we didn't see any C++ virtual dispatch problem exposed when evaluating the Greedy inliner because greedy inliner reused the SCC inliner to do the local decision. If anyone has a test case for this or program in mind, I can try to run it and report the findings.
> 
>  
> 
> I like the suggestion from Hal to have a "more in-depth discussion on the goals of inlining, and how we do, or plan to, achieve them." Since now we have two concrete solutions for inliners, how about we have BOF discussion at the LLVM dev conference? I can send a proposal.
> 
>  
> 
> What do you guys think?
> 
>  
> 
> Here are some details on the scenarios we considered when tuning the greedy inliner and other possible future scenarios. The first one is A <- B <- C case mentioned, B is in a loop of A.  B to A should be higher priority to be considered before C to B
> 
> A() {
> 
> For(..)  { Call B() }
> 
> }
> 
> B() {
> 
>    call C()
> 
> }
> 
>  
> 
> The second is A called B many times, one bar is in a loop that need to be inlined.  B() instead loops should have higher priority to be considered than other B in A(). other B may not be benefitical to be inlined for code size tuning.
> 
> A() {
> 
> Call B()
> 
> Call B()
> 
> Call B()
> 
> For(...) { Call B() }
> 
> Call B()
> 
> }
> 
>  
> 
> The third is a series of continue calls, in the architecture we targeted on, we don't want to inline them. Inliner should have a global sense to do the decision.
> 
> A() {
> 
> If (...) {
> 
>       Call B();
> 
>       Call B();
> 
>       Call B();
> 
>       Call B();
> 
> }else if {
> 
>       Call B();
> 
>       Call B();
> 
>       Call B();
> 
>       Call B();
> 
> }else ...
> 
> ...
> 
> }
> 
>  
> 
> The next one is a future scenario that supports profile based decision. I considered this case but not implemented in the current version of greey inliner. Block frequency info can be used in computation to guide order and decision.
> 
>  
> 
> The key to take into account top-down/bottom up differences and the scenarios described above is to have an inliner framework that has the concept of a global queue with  sorted weights.  It is a very flexible framework.  Any future LLVM inliner solution we decide on should support this type of feature.
> 
>  
> 
> Yin
> 
>  
> 
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Chandler Carruth
> Sent: Sunday, August 03, 2014 11:51 PM
> To: Jiangning Liu
> Cc: Jiangning Liu; Commit Messages and Patches for LLVM
> 
> 
> Subject: Re: [PATCHES] A module inliner pass with a greedy call site queue
> 
>  
> 
> Just a brief note...
> 
>  
> 
> On Sun, Aug 3, 2014 at 11:42 PM, Jiangning Liu <liujiangning1 at gmail.com> wrote:
> 
> 1. I measured code size impact by Yin's patch, overall I don't see code size regression.
> 
>  
> 
> 1) For the following cpp program in SPEC, we have the following data.
> 
>  
> 
> -O2 result:
> 
>  
> 
> spec old_text_section old_data_section new_text_section new_text_section text_percentage data_percentage
> 
> 252.eon 302848 2232 297301 2312 -1.83% 3.58%
> 
> 450.soplex 366474 1536 389164 1656 6.19% 7.81%
> 
> 453.povray 898032 12632 850444 12632 -5.30% 0.00%
> 
> 471.omnetpp 685516 9136 693349 9128 1.14% -0.09%
> 
> 473.astar 38999 860 41011 860 5.16% 0.00%
> 
> 483.xalancbmk 4282478 139376 4414286 139376 3.08% 0.00%
> 
> sum 6574347 165772 6685555 165964 1.69% 0.12%
> 
> 
> SPEC is highly misleading w.r.t. code size. Also, there are several regressions in code size in addition to improvements. It would be useful to get measurements from larger code bases.
> 
> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140808/f08165c3/attachment.html>


More information about the llvm-commits mailing list