<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: arial,helvetica,sans-serif; font-size: 10pt; color: #000000'><br><hr id="zwchr"><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Xinliang David Li" <davidxl@google.com><br><b>To: </b>"Hal Finkel" <hfinkel@anl.gov><br><b>Cc: </b>"Artem Belevich" <tra@google.com>, "llvm-dev" <llvm-dev@lists.llvm.org>, "chandlerc" <chandlerc@gmail.com>, "Easwaran Raman" <eraman@google.com><br><b>Sent: </b>Thursday, March 10, 2016 11:00:30 AM<br><b>Subject: </b>Re: [llvm-dev] [RFC] Target-specific parametrization of function inliner<br><br><div dir="ltr">IMO, a good inliner with a precise cost/benefit model will eventually need what Art is proposing here. <div><br></div><div>Giving the function call overhead as an example. It depends on a couple of factors: 1) call/return instruction latency; 2) function epilogue/prologue; 3) calling convention (argument parsing, using registers or not, what register classes etc). All these factors depend on target information. If we want go deeper, we know certain micro architectures uses a stack of call/return pairs to help branch prediction of ret instructions -- such stack has a target specific limit which can be triggered when a callsite is deep in the callchain. Register file size and register pressure increase due to inline comes as another example.</div><div><br></div><div id="DWT10649">Another relevant example is the icache/itlb sizes. To do a more precise analysis of the cost to 'speed' due to icache/itlb pressure increase requires target information, profile information as well as some global analysis. Easwaran has done some research in this area in the past and can share the analysis design when other things are ready.</div></div></blockquote><br>I don't know what you mean by "when other things are ready", but what you say above sounds exactly right. I'm certainly curious what Easwaran has found.<br><br>Generally, there seem to be two categories here:<br><br> 1. Locally decidable issues, for which there are (or can be) good static heuristics (call latencies, costs associated with parameter passing, stack spilling, etc.)<br> 2. Globally decidable issues, like reducing the number of pages consumed by temporally-correlated hot code regions - profiling data likely necessary for good decision-making (although it might be possible to make a reasonable function-local threshold based on page size without it)<br><br>and then there are things like icache/itlb effects due to multiple applications running simultaneously, for which profiling might help, but are also policy-level decisions over which users may need more-direct control.<br><br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div></div><div class="gmail_extra"><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><br>
<br>
Hi Art,<br>
<br>
I've long thought that we should have a more principled way of doing inline profitability. There is obviously some cost to executing a function body, some call site overhead, and some cost reduction associated with any post-inlining simplifications. If inlining reduces the overall call site cost by more than some factor, say 1% (this should probably depend on the optimization level), then we should inline. With profiling information, we might even use global speedup instead of local speedup.<br></blockquote><div><br></div><div>yes -- with target specific cost information, global speedup analysis can be more precise :)</div><div> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<br>
Whether we need a target customization of this threshold, or just a way for a target to supplement the fine inlining decision, is unclear to me. It is also true that a the result of a bunch of locally-optimal decisions might be far from the global optimum. Maybe the target has something to say about that?<br></blockquote><div><br></div><div><br></div><div id="DWT10365">The concept of threshold can be a topic of another discussion. In current design, I think the threshold should remain target independent. It is the cost that is target specific.</div></div></div></div></blockquote>That's fine, but the units are important here. Having a target independent threshold in terms of, roughly, instruction count makes little sense. How instruction count is correlated with either performance or code size is highly target specific (although it is certainly closer for code size). That, however, is, roughly what our TTI.getUserCost gives us. Having target-independent thresholds like % speedup (e.g. inlining should be done when the speedup is > some %) or code-size thresholds (e.g. functions spanning more than a 4 KB are bad) makes sense.<br><br> -Hal<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div><br></div><div>thanks,</div><div><br></div><div>David</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<br>
In short, I'm fine with what you're proposing, but to the extent possible, I want the numbers provided by the target to mean something. Replacing a global set of somewhat-arbitrary magic numbers, with target-specific sets of somewhat-arbitrary magic numbers should be our last choice.<br>
<br>
Thanks again,<br>
Hal<br>
<br>
<br>
><br>
> Thanks,<br>
<span class="HOEnZb"><font color="#888888">> --<br>
><br>
><br>
> --Artem Belevich<br>
> _______________________________________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
><br>
<br>
--<br>
Hal Finkel<br>
Assistant Computational Scientist<br>
Leadership Computing Facility<br>
Argonne National Laboratory<br>
</font></span></blockquote></div><br></div></div>
</blockquote><br><br><br>-- <br><div><span name="x"></span>Hal Finkel<br>Assistant Computational Scientist<br>Leadership Computing Facility<br>Argonne National Laboratory<span name="x"></span><br></div></div></body></html>