<div dir="ltr">IMO, the appropriate thing for TTI to inform the inliner about is how costly the actual act of a "call" is likely to be. I would hope that this would only be used on targets where there is some really dramatic overhead of actually doing a function call such that the code size cost incurred by inlining is completely dwarfed by the improvements. GPUs are one of the few platforms that exhibit this kind of behavior, although I don't think they're truly unique, just a common example.<div><br></div><div>This isn't quite the same thing as the cost of the call instruction, which has much more to do with the size. Instead, it has to do with the expected consequences of actually leaving a call edge in the program.</div><div><br></div><div>To me, this pretty accurately reflects the TTI hook we have for customizing loop unrolling where the cost of having a cyclic CFG is modeled to help indicate that on some targets (also GPUs) it is worth a very large amount of code size growth to simplify the control flow in a particular way.</div><div><br></div><div>Does that make sense to you Hal? Based on that, it would really just be a scaling factor of the inline heuristics. Unsure of how to more scientifically express this construct.</div><div><br></div><div>-Chandler</div></div><br><div class="gmail_quote"><div dir="ltr">On Thu, Mar 10, 2016 at 3:42 PM Hal Finkel via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">----- Original Message -----<br>

> From: "Artem Belevich via llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>><br>

> To: "llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>><br>

> Sent: Tuesday, March 1, 2016 6:31:06 PM<br>

> Subject: [llvm-dev] [RFC] Target-specific parametrization of function inliner<br>

><br>

> Hi,<br>

><br>

><br>

> I propose to make function inliner parameters adjustable for specific<br>

> target.<br>

><br>

> Currently function inlining pass appears to be target-agnostic with<br>

> various constants for calculating call cost hardcoded. While it<br>

> works reasonably well for general purpose CPUs, some quirkier<br>

> targets like NVPTX would benefit from target-specific tuning.<br>

><br>

><br>

> Currently it appears that there are two things that need to be done:<br>

><br>

><br>

> * add Inliner preferences to TargetTransformInfo in a way similar to<br>

> how we customize loop unrolling. Use it to provide inliner with<br>

> target-specific thresholds and other parameters.<br>

> * augment Inliner pass to use existing TargetTransformInfo API to<br>

> figure out cost of particular call on a given target.<br>

> TargetTransforInfo already has getCallCost(), though it does not<br>

> look like anything uses it.<br>

><br>

><br>

> Comments? Concerns? Suggestions?<br>

><br>

<br>

Hi Art,<br>

<br>

I've long thought that we should have a more principled way of doing inline profitability. There is obviously some cost to executing a function body, some call site overhead, and some cost reduction associated with any post-inlining simplifications. If inlining reduces the overall call site cost by more than some factor, say 1% (this should probably depend on the optimization level), then we should inline. With profiling information, we might even use global speedup instead of local speedup.<br>

<br>

Whether we need a target customization of this threshold, or just a way for a target to supplement the fine inlining decision, is unclear to me. It is also true that a the result of a bunch of locally-optimal decisions might be far from the global optimum. Maybe the target has something to say about that?<br>

<br>

In short, I'm fine with what you're proposing, but to the extent possible, I want the numbers provided by the target to mean something. Replacing a global set of somewhat-arbitrary magic numbers, with target-specific sets of somewhat-arbitrary magic numbers should be our last choice.<br>

<br>

Thanks again,<br>

Hal<br>

<br>

<br>

><br>

> Thanks,<br>

> --<br>

><br>

><br>

> --Artem Belevich<br>

> _______________________________________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

><br>

<br>

--<br>

Hal Finkel<br>

Assistant Computational Scientist<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div>