<div dir="ltr">Hi,<div><br></div><div>1. I measured code size impact by Yin's patch, overall I don't see code size regression.</div><div><br></div><div>1) For the following cpp program in SPEC, we have the following data.</div>

<div><br></div><div>-O2 result:</div><div><br></div><div><div>spec<span class="" style="white-space:pre"> </span>old_text_section<span class="" style="white-space:pre">  </span>old_data_section<span class="" style="white-space:pre">  </span>new_text_section<span class="" style="white-space:pre">  </span>new_text_section<span class="" style="white-space:pre">  </span>text_percentage<span class="" style="white-space:pre">   </span>data_percentage</div>

</div><div>252.eon<span style="white-space:pre-wrap">       </span>302848<span style="white-space:pre-wrap">  </span>2232<span style="white-space:pre-wrap">    </span>297301<span style="white-space:pre-wrap">  </span>2312<span style="white-space:pre-wrap">    </span>-1.83%<span style="white-space:pre-wrap">  </span>3.58%<br>

</div><div>


<div>450.soplex<span style="white-space:pre-wrap">        </span>366474<span style="white-space:pre-wrap">  </span>1536<span style="white-space:pre-wrap">    </span>389164<span style="white-space:pre-wrap">  </span>1656<span style="white-space:pre-wrap">    </span>6.19%<span style="white-space:pre-wrap">   </span>7.81%</div>


<div>453.povray<span style="white-space:pre-wrap">        </span>898032<span style="white-space:pre-wrap">  </span>12632<span style="white-space:pre-wrap">   </span>850444<span style="white-space:pre-wrap">  </span>12632<span style="white-space:pre-wrap">   </span>-5.30%<span style="white-space:pre-wrap">  </span>0.00%</div>


<div>471.omnetpp<span style="white-space:pre-wrap">       </span>685516<span style="white-space:pre-wrap">  </span>9136<span style="white-space:pre-wrap">    </span>693349<span style="white-space:pre-wrap">  </span>9128<span style="white-space:pre-wrap">    </span>1.14%<span style="white-space:pre-wrap">   </span>-0.09%</div>


<div>473.astar<span style="white-space:pre-wrap"> </span>38999<span style="white-space:pre-wrap">   </span>860<span style="white-space:pre-wrap">     </span>41011<span style="white-space:pre-wrap">   </span>860<span style="white-space:pre-wrap">     </span>5.16%<span style="white-space:pre-wrap">   </span>0.00%</div>


<div>483.xalancbmk<span style="white-space:pre-wrap">     </span>4282478<span style="white-space:pre-wrap"> </span>139376<span style="white-space:pre-wrap">  </span>4414286<span style="white-space:pre-wrap"> </span>139376<span style="white-space:pre-wrap">  </span>3.08%<span style="white-space:pre-wrap">   </span>0.00%</div>


<div>sum<span style="white-space:pre-wrap">       </span>6574347<span style="white-space:pre-wrap"> </span>165772<span style="white-space:pre-wrap">  </span>6685555<span style="white-space:pre-wrap"> </span>165964<span style="white-space:pre-wrap">  </span>1.69%<span style="white-space:pre-wrap">   </span>0.12%</div>


</div><div><br></div><div>-Os result:</div><div><br></div><div><div>spec<span class="" style="white-space:pre">     </span>old_text_section<span class="" style="white-space:pre">  </span>old_data_section<span class="" style="white-space:pre">  </span>new_text_section<span class="" style="white-space:pre">  </span>new_text_section<span class="" style="white-space:pre">  </span>text_percentage<span class="" style="white-space:pre">   </span>data_percentage</div>

<div>252.eon<span class="" style="white-space:pre">     </span>282163<span class="" style="white-space:pre">    </span>2256<span class="" style="white-space:pre">      </span>280966<span class="" style="white-space:pre">    </span>2280<span class="" style="white-space:pre">      </span>-0.42%<span class="" style="white-space:pre">    </span>1.06%</div>

<div>450.soplex<span class="" style="white-space:pre">  </span>311515<span class="" style="white-space:pre">    </span>1544<span class="" style="white-space:pre">      </span>294294<span class="" style="white-space:pre">    </span>1568<span class="" style="white-space:pre">      </span>-5.53%<span class="" style="white-space:pre">    </span>1.55%</div>

<div>453.povray<span class="" style="white-space:pre">  </span>759599<span class="" style="white-space:pre">    </span>12608<span class="" style="white-space:pre">     </span>745275<span class="" style="white-space:pre">    </span>12608<span class="" style="white-space:pre">     </span>-1.89%<span class="" style="white-space:pre">    </span>0.00%</div>

<div>471.omnetpp<span class="" style="white-space:pre"> </span>657335<span class="" style="white-space:pre">    </span>9144<span class="" style="white-space:pre">      </span>652195<span class="" style="white-space:pre">    </span>9128<span class="" style="white-space:pre">      </span>-0.78%<span class="" style="white-space:pre">    </span>-0.17%</div>

<div>473.astar<span class="" style="white-space:pre">   </span>32439<span class="" style="white-space:pre">     </span>860<span class="" style="white-space:pre">       </span>32847<span class="" style="white-space:pre">     </span>860<span class="" style="white-space:pre">       </span>1.26%<span class="" style="white-space:pre">     </span>0.00%</div>

<div>483.xalancbmk<span class="" style="white-space:pre">       </span>3935159<span class="" style="white-space:pre">   </span>139392<span class="" style="white-space:pre">    </span>3854060<span class="" style="white-space:pre">   </span>139376<span class="" style="white-space:pre">    </span>-2.06%<span class="" style="white-space:pre">    </span>-0.01%</div>

<div>summary<span class="" style="white-space:pre">     </span>5978210<span class="" style="white-space:pre">   </span>165804<span class="" style="white-space:pre">    </span>5859637<span class="" style="white-space:pre">   </span>165820<span class="" style="white-space:pre">    </span>-1.98%<span class="" style="white-space:pre">    </span>0.01%</div>

</div><div><br></div><div>2) For bzip2 being cared by Nick, I do see code size reduction like below.<span style="font-family:arial,sans-serif;font-size:13.63636302947998px"> </span><br></div><div><br></div><div>-O2</div><div>

<br></div><div><div>spec<span class="" style="white-space:pre">       </span>old_text_section<span class="" style="white-space:pre">  </span>old_data_section<span class="" style="white-space:pre">  </span>new_text_section<span class="" style="white-space:pre">  </span>new_text_section<span class="" style="white-space:pre">  </span>text_percentage<span class="" style="white-space:pre">   </span>data_percentage</div>

<div>256.bzip2<span class="" style="white-space:pre">   </span>41076<span class="" style="white-space:pre">     </span>3892<span class="" style="white-space:pre">      </span>40188<span class="" style="white-space:pre">     </span>3892<span class="" style="white-space:pre">      </span>-2.16%<span class="" style="white-space:pre">    </span>0.00%</div>

</div><div><div>401.bzip2<span class="" style="white-space:pre">    </span>68435<span class="" style="white-space:pre">     </span>3844<span class="" style="white-space:pre">      </span>66787<span class="" style="white-space:pre">     </span>3844<span class="" style="white-space:pre">      </span>-2.41%<span class="" style="white-space:pre">    </span>0.00%</div>

</div><div><br></div><div>-Os</div><div><br></div><div><div>spec<span class="" style="white-space:pre">     </span>old_text_section<span class="" style="white-space:pre">  </span>old_data_section<span class="" style="white-space:pre">  </span>new_text_section<span class="" style="white-space:pre">  </span>new_text_section<span class="" style="white-space:pre">  </span>text_percentage<span class="" style="white-space:pre">   </span>data_percentage</div>

</div><div><div>256.bzip2<span class="" style="white-space:pre">    </span>29750<span class="" style="white-space:pre">     </span>3892<span class="" style="white-space:pre">      </span>28894<span class="" style="white-space:pre">     </span>3892<span class="" style="white-space:pre">      </span>-2.88%<span class="" style="white-space:pre">    </span>0.00%</div>

</div><div><div>401.bzip2<span class="" style="white-space:pre">    </span>51845<span class="" style="white-space:pre">     </span>3844<span class="" style="white-space:pre">      </span>51429<span class="" style="white-space:pre">     </span>3844<span class="" style="white-space:pre">      </span>-0.80%<span class="" style="white-space:pre">    </span>0.00%</div>

</div><div><br></div><div>2. I spent some time studying the original inlining algorithm and the new one provided by Yin, and got some thoughts,</div><div><br></div><div>1) It's clear that this new inline algorithm doesn't really want to *replace* the original one, and I can see the following code is to call the original SCC inliner for local decision making purpose.,</div>

<div><br></div><div><div>    CallGraphNode Node(Caller);</div><div>    std::vector<CallGraphNode*> NodeVec;</div><div>    NodeVec.push_back(&Node);</div><div>    CallGraphSCC  SCC(NULL);</div><div>    SCC.initialize(&NodeVec[0], &NodeVec[0]+NodeVec.size());</div>

<div>    FuncInliner->setPreferredCallSite(CS);</div><div>    FuncInliner->setBonusThreshold(BonusThreshold);</div><div>    bool LocalChanged = FuncInliner->runOnSCC(SCC);</div><div>    Changed |= LocalChanged;</div>

</div><div><br></div><div>So the special tunings from SCC Inliner should be able to be inherited accordingly. So my understanding is the tuning of "<span style="font-family:arial,sans-serif;font-size:13.63636302947998px">the impact of inlining on virtual dispatch" from current SCC Inliner should be kept naturally. Maybe my undertstanding is wrong here, so Yin can confirm around this.</span></div>

<div><br></div><div>2) I can see the point that Chandler is working on improving current pass manager to help solving this inlining B to A issue for A->B->C, but I personally think this new GreedyInliner modeling is still meaningful, because</div>

<div><br></div><div>A. It can solve the performance issue of 252.eon and 177.mesa. At present, we have big gap for those two benchmarks comparing with GCC, and we want this improvement very much for our customer and stake holders. In reality, I personally like to see we raise the performance bar under the condition of not hurting code size.  Also, I don't see code size regression for them at all.</div>

<div><br></div><div><div>177.mesa<span class="" style="white-space:pre">        </span>483677<span class="" style="white-space:pre">    </span>972<span class="" style="white-space:pre">       </span>479625<span class="" style="white-space:pre">    </span>972<span class="" style="white-space:pre">       </span>-0.84%<span class="" style="white-space:pre">    </span>0.00%</div>

<div>252.eon<span class="" style="white-space:pre">     </span>302848<span class="" style="white-space:pre">    </span>2232<span class="" style="white-space:pre">      </span>297301<span class="" style="white-space:pre">    </span>2312<span class="" style="white-space:pre">      </span>-1.83%<span class="" style="white-space:pre">    </span>3.58%</div>

</div><div><div><br></div></div><div>B. I can see this GreedyInliner doesn't change current infrastructure at all, and it is simply an extension of current SCC inliner. I don't see any blocking issue for Chandler's pass manager improvement. For long run, if we can see even better result for both performance and code size in future with the new pass manager infrastructure, we can still get it removed. But at least, again, raising the bar is important for us to build high quality product with llvm.</div>

<div><br></div><div>C. From the GreedyInliner design itself, I do see some valuable points, although we all know it is a heuristic problem and it can't be perfect. For example, as James mentioned, <span style="font-family:arial,sans-serif;font-size:13.63636302947998px">taking into account important factors such as </span><span style="font-family:arial,sans-serif;font-size:13.63636302947998px">loop nest depth. Actually my understanding of this GreedyInliner is essentially it provides an even flexible call site parsing ordering capability with some enhanced modeling rules. If the modeling itself is simply just SCC, then it would degrade to be the original SCC Inliner. </span></div>

<div><span style="font-family:arial,sans-serif;font-size:13.63636302947998px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:13.63636302947998px">3. Therefore, my suggestions to Yin are, </span></div>

<div><span style="font-family:arial,sans-serif;font-size:13.63636302947998px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:13.63636302947998px">1) Make an even clear call site queue interface, and then we would be able to have different inlining heuristic building blocks to be supported, no matter whether it is top-down or bottom-up. For example, to be specific, the following code should be able to be abstracted with an interface.</span></div>

<div><span style="font-family:arial,sans-serif;font-size:13.63636302947998px"><br></span></div><div><div><font face="arial, sans-serif">  // Collect all call sites and compute function size.</font></div><div><font face="arial, sans-serif">  for (Module::iterator it = M.begin(); it != M.end(); ++it) {</font></div>

<div><font face="arial, sans-serif">    Function* F = &*it;</font></div><div><font face="arial, sans-serif">    if (F->isDeclaration())</font></div><div><font face="arial, sans-serif">      continue;</font></div><div>

<font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">    unsigned long BBCount;</font></div><div><font face="arial, sans-serif">    unsigned long InstCount;</font></div><div><font face="arial, sans-serif">    FunctionCallCount[F] = CollectFunctionCallSites(F, CallSites, BBCount,</font></div>

<div><font face="arial, sans-serif">                                                    InstCount);</font></div><div><font face="arial, sans-serif">    unsigned long FS = InstCount + BasicBlockSizeCost * BBCount;</font></div>

<div><font face="arial, sans-serif">    FunctionSizes[F] = FS;</font></div><div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">    // We only inline linkonce functions into other functions.</font></div>

<div><font face="arial, sans-serif">    // And not count it for module size growth.</font></div><div><font face="arial, sans-serif">    if (!F->hasLinkOnceLinkage())</font></div><div><font face="arial, sans-serif">      TotalSize += FS;</font></div>

<div><font face="arial, sans-serif">  }</font></div><div style="font-family:arial,sans-serif;font-size:13.63636302947998px"><br></div></div><div style="font-family:arial,sans-serif;font-size:13.63636302947998px">For SCC Inliner, it can be implemented just a SCC parsing order.</div>

<div style="font-family:arial,sans-serif;font-size:13.63636302947998px">For Greedy Inliner, it can be implemented by using ComputeCallSitesWeight(...) + GetBestCallSite(...).</div><div style="font-family:arial,sans-serif;font-size:13.63636302947998px">

<br></div><div style="font-family:arial,sans-serif;font-size:13.63636302947998px">2) Collect more benchmark data for more targets. e.g. x86, and more benchmarks like spec2006, geekbench, eembc, and others etc. And I can help to measure the performance impact on cortex-a57.</div>

<div style="font-family:arial,sans-serif;font-size:13.63636302947998px"><br></div><div style="font-family:arial,sans-serif;font-size:13.63636302947998px"><span style="font-family:arial;font-size:small">Thanks,</span><br></div>

<div>-Jiangning</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-07-31 23:04 GMT+08:00 Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">----- Original Message -----<br>

> From: "Chandler Carruth" <<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>><br>

> To: "Yin Ma" <<a href="mailto:yinma@codeaurora.org">yinma@codeaurora.org</a>><br>

</div><div class="">> Cc: "James Molloy" <<a href="mailto:james.molloy@arm.com">james.molloy@arm.com</a>>, "Nick Lewycky" <<a href="mailto:nicholas@mxc.ca">nicholas@mxc.ca</a>>, "Hal Finkel" <<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>>,<br>


> <a href="mailto:apazos@codeaurora.org">apazos@codeaurora.org</a>, "Jiangning Liu" <<a href="mailto:Jiangning.Liu@arm.com">Jiangning.Liu@arm.com</a>>, "Commit Messages and Patches for LLVM"<br>


> <<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a>><br>

> Sent: Wednesday, July 30, 2014 5:52:19 PM<br>

> Subject: Re: [PATCHES] A module inliner pass with a greedy call site queue<br>

><br>

><br>

><br>

><br>

><br>

</div><div class="">> On Wed, Jul 30, 2014 at 3:46 PM, Yin Ma < <a href="mailto:yinma@codeaurora.org">yinma@codeaurora.org</a> ><br>

> wrote:<br>

><br>

><br>

><br>

> We found the current SCC<br>

> inliner cannot inline a critical function in one of the SPEC2000<br>

> benchmarks<br>

> unless we increase the threshold to a very large number. Like A calls<br>

> B<br>

> calls C, The SCC inliner will start B -> C and inlined C into B,<br>

> however,<br>

> the performance gain is B to A and B is in a loop of A. However,<br>

> after<br>

> inlining C to B, B becomes very large so B cannot be inlined to A<br>

> with<br>

> default inline threshold.<br>

> This is a well known and studied limitation of the inliner. I gave a<br>

> talk at a prior dev meeting about it. There are specific ways to<br>

> address it within the existing inlining framework that I've been<br>

> working on for some time, but it requires infrastructure changes to<br>

> implement.<br>

><br>

<br>

</div>I realize that you've discussed this in the past, but perhaps a recap would be helpful. As I recall, the pass-manager rewrite will enable the inliner to make use of function-level analysis. Specifically, this will include making use of profiling data so that, with appropriate profiling data, we can avoid inlining cold call sites.<br>


<br>

On a theoretical basis, I'm not sure that addresses the same issue as top-down inlining, although there is obviously some overlap in effect. Specifically, if a goal of inlining is to push the critical path length as close as possible to the target's limit for efficient dispatch, but no farther, then I fail to see how pure bottom-up inlining can ever achieve that as well as a top-down approach. That having been said, I think that bottom-up inlining seems much more likely to expose optimization opportunities.<br>


<br>

In short, while I'm not at all convinced that Yin's heuristic is the right approach, I feel that a more in-depth discussion on the goals of inlining, and how we do, or plan to, achieve them, would be beneficial.<br>


<br>

Thanks again,<br>

Hal<br>

<div class="im HOEnZb"><br>

><br>

><br>

><br>

> I'm still very concerned that the code size impact and other impacts<br>

> have not been widely analyzed, but only narrowly analyzed. For<br>

> example, LLVM's inliner has historically shown over 10% code size<br>

> advantage across a wide range of benchmark C++ applications, large<br>

> and small. I don't think we want to sacrifice that.<br>

><br>

><br>

><br>

><br>

> I also don't think you should underestimate the impact of inlining on<br>

> virtual dispatch. The collapsing of indirect calls to direct calls<br>

> is *very* important and has been carefully tuned in LLVM's current<br>

> inliner.<br>

<br>

</div><div class="im HOEnZb">--<br>

Hal Finkel<br>

Assistant Computational Scientist<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

</div><div class="HOEnZb"><div class="h5">_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

</div></div></blockquote></div><br></div>