<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">Hi David,</div><div class="gmail_quote"><br></div><div class="gmail_quote">Thanks for your feedback...</div><div class="gmail_quote"><br></div><div class="gmail_quote">2015-03-18 14:25 GMT+08:00 Xinliang David Li <span dir="ltr"><<a href="mailto:xinliangli@gmail.com" target="_blank">xinliangli@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">+Easwaran who is working on improving LLVM inliner with a more sophisticated cost-model. </div></blockquote><div><br></div><div>I'm not sure a more sophisticated cost model can work well for inlining or not. Current inlining cost is to model code size impact only, so I assume you are talking about the cost model for performance. Inliner happens at very early stage of LLVM compilation, and it's really hard to give accurate performance cost impact. For example, you would have to introduce register cost model to consider register spill overhead, but I personally think is it's hard to be accurate at such an early stage of LLVM compilation. Also, if we want to get it accurate, we would have to pay compile-time cost. I think compile-time is one of the advantages of LLVM over GCC we don't want to lose. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">The cases mentioned in your patch will be covered. Profile (including static profile) data will also be used in the analysis.<br></div></blockquote><div><br></div><div>My patch doesn't tend to cover PGO.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Tue, Mar 17, 2015 at 10:46 PM, Jiangning Liu <span dir="ltr"><<a href="mailto:liujiangning1@gmail.com" target="_blank">liujiangning1@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi chandlerc, apazos, yinma, hfinkel,<br>

<br>

Following the discussion in BOF session of LLVM dev meeting 2014, I did some experiments to enhance LLVM inliner and want to share my result at the moment. My major goal is to improve -O3 performance without profiling support, which should be the simplest scenario of using compiler optimization.<br>

<br>

Inlining more code usually could increase performance at the cost of code size bloat, but overly inlining code could increase register pressure and hurt performance, e.g. some more loop invariants can be detected and hoisted out of loop, and finally register pressure increases a lot. In the meantime, inline is expensive because we have to analyze every function in terms of every call site with different arguments to remove dead code as possible as we could. Therefore, the biggest challenge of Inlining problem is how we can make trade-off among performance improvement, code size bloat and compiler slowdown in a smart manner.<br>

<br>

1. Design<br>

<br>

My design to address the issues described above is listed as below.<br>

<br>

(1) For performance, the main idea is enlarging inlining threshold heuristically for *hot* spots detected at compile-time. The codes with the following properties are usually *hot*,<br>

(1.a) callee Inside a loop. If callee can be inlined into a loop, we could probably expose more optimization opportunities. E.g. loop invariant hoist. And this solution is particularly useful to small loops, like having less than 2~3 BasicBlocks, because such a simple loop structure would be less possible to trigger register pressure issue.<br>

(1.b) callee with constant argument. For example, if the constant argument is used as a loop boundary, it could trigger completely different loop unrolling behavior, like full unroll or partial unroll.<br>

<br>

Solution (1.a) requires loop info. With current pass manager behavior, CallGraphSCCPass doesn’t allow to use getAnalysisUsage to obtain loop info, but we can define a lightweight LoopAnalyzer pass inside module SimpleInliner, and this pass can be implemented simply by calling LoopInfoBase and DominatorTree.<br></blockquote><div><br></div></span><div>Chandler's new pass manager is designed to handle this.</div></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

(2) For code size, we have two solutions,<br>

(2.a) It doesn't make sense to inine a lot of *cold* code. Since non-hot code can be treated as cold code, we can reduce the normal threshold. In the patch the default threshold for -O3 is changed from 275 to 240. This way, we could save code size a lot. The performance reduction caused by reducing default threshold could be compensated by increasing threshold for *hot* code inside loops.<br>

(2.b) It would be quite abnormal if a function call the same callee many times, even if they use different arguments, because this kind of code can easily refactored by loop. So we can avoid inlining the same callee many times if we find this case.<br></blockquote><div><br></div></span><div>This simple heuristic is not always valid. For instance, the '[ ]' operator for a container can be invoked many times with different argument. Inlining them can potentially expose CSE opportunities across inline instances of the same callee.</div></div></div></div></blockquote><div><br></div><div>Agree, but I never say it wouldn't hurt performance for some cases, and I just want to cover the most reasonable scenarios. If this is the case, I was hoping programmer change it by using loop.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

(3) For compile time, it’s a big challenge, because loop info calculation is really expensive.<br>

(3.a) Don’t re-compute loop info every time callee is inlined, but only do it once we start to check the new callees introduced by inlining a callee. For example, A->B->C, and A->D->E. When analyzing caller A, if we decide to inline B into A, C will be exposed to A, and at this moment, we don’t re-compute loop info until checking A->D is completed, because the loop info about D won’t be affected after inlining B.<br>

(3.b) Solve A->B->C dilemma differently using early exit. For example, for call graph A->B->C, and A->B->D. When analyzing caller B, if A->B->C pass the ABC checking, i.e. C can be inlined into B, and (B+C) can be inlined into A as well, current algorithm will defer it until analyzing caller A. But if we get D inlined into B before checking caller A, the code size of B could increase, and finally fails to be inlined into A. (Hal has explained this problem previously using vector push_back case). It means A->B->C will be kept as it is eventually, although D is inlined into B. This is *not* a problem, but a heuristic choice, I think. For a lot of cpp program, there are a lot of small functions could trigger this ABC issue. But choosing B->D rather than A->B->C would hurt compile time, because it will check all of callees inside B, although ABC case is already detected. So we can early exit as soon as positive ABC case is detected, and then the new algorithm will inline B into A first, at the moment of analyzing caller A. And then C and D could both be inlined into A eventually.<br></blockquote><div><br></div></span><div>For IPA, the loop info/loop tree representation can be trimmed to be much leaner. Also it should support incremental update. </div></div></div></div></blockquote><div><br></div><div>I'm not clear the IPA you are talking about is crossing modules or not. Can you clarify?</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>David</div><div> </div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">

<br>

In order to apply methods (2.b) and (3.a), we have to solve an inline analysis ordering issue. Current inliner analyzes call sites in an unstable order. For example, A->B1->C1, A->B2->C2, and A->B3->C3. The call site analysis order of analyzing caller A was B1, C1, B3, C3, B2, C2. Now I change the order to be B1, B2, B3, C1, C2, C3.<br>

<br>

2. Benchmark<br>

<br>

Chandler previously mentioned SPEC benchmark is not a good candidate for measuring code size impact, so I use llvm bootstrap and chromium as the benchmarks for compile time and code size.<br>

<br>

On llvm revision r232011 (March 12), I got the following benchmark data,<br>

<br>

1) Performance:<br>

SPEC 2000 geomean for AArch64: +1.24%<br>

SPEC 2006 geomean for AArch64: +0.3%<br>

2) Code size:<br>

* SPEC 2000+2006: +2.68%<br>

* clang/llvm: +2.88%<br>

* Chromium: +2%~3%<br>

3) Compile-time:<br>

* llvm bootstrap on x86: +1.8%<br>

* SPEC2006 build on x86: +2.7%<br>

<br>

Thanks,<br>

-Jiangning<br>

<br>

REPOSITORY<br>

  rL LLVM<br>

<br>

<a href="http://reviews.llvm.org/D8408" target="_blank">http://reviews.llvm.org/D8408</a><br>

<br>

Files:<br>

  include/llvm/Transforms/IPO/InlinerPass.h<br>

  lib/Transforms/IPO/InlineSimple.cpp<br>

  lib/Transforms/IPO/Inliner.cpp<br>

  test/Transforms/Inline/inline-loop.ll<br>

  test/Transforms/Inline/inline-misc.ll<br>

<br>

EMAIL PREFERENCES<br>

  <a href="http://reviews.llvm.org/settings/panel/emailpreferences/" target="_blank">http://reviews.llvm.org/settings/panel/emailpreferences/</a><br>

<br></div></div>_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

<br></blockquote></div><br></div></div>

</blockquote></div><br></div></div>