[llvm-dev] [RFC] Target-specific parametrization of function inliner

Hal Finkel via llvm-dev llvm-dev at lists.llvm.org
Fri Apr 1 12:35:39 PDT 2016


----- Original Message -----

> From: "Mehdi Amini via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "Xinliang David Li" <davidxl at google.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Friday, April 1, 2016 2:26:27 PM
> Subject: Re: [llvm-dev] [RFC] Target-specific parametrization of
> function inliner

> > On Mar 10, 2016, at 10:34 AM, Xinliang David Li via llvm-dev <
> > llvm-dev at lists.llvm.org > wrote:
> 

> > On Thu, Mar 10, 2016 at 6:49 AM, Chandler Carruth <
> > chandlerc at google.com > wrote:
> 

> > > IMO, the appropriate thing for TTI to inform the inliner about is
> > > how
> > > costly the actual act of a "call" is likely to be. I would hope
> > > that
> > > this would only be used on targets where there is some really
> > > dramatic overhead of actually doing a function call such that the
> > > code size cost incurred by inlining is completely dwarfed by the
> > > improvements. GPUs are one of the few platforms that exhibit this
> > > kind of behavior, although I don't think they're truly unique,
> > > just
> > > a common example.
> > 
> 

> > > This isn't quite the same thing as the cost of the call
> > > instruction,
> > > which has much more to do with the size. Instead, it has to do
> > > with
> > > the expected consequences of actually leaving a call edge in the
> > > program.
> > 
> 
> > > To me, this pretty accurately reflects the TTI hook we have for
> > > customizing loop unrolling where the cost of having a cyclic CFG
> > > is
> > > modeled to help indicate that on some targets (also GPUs) it is
> > > worth a very large amount of code size growth to simplify the
> > > control flow in a particular way.
> > 
> 

> > From 10000 foot, the LLVM inliner implements a size based heuristic
> > :
> > if the inline instance's size*/cost after simplification via
> > propagating the call context (actually the relative size -- the
> > callsite cost is subtracted from it), is smaller than a threshold
> > (adjusted from a base value), then the callsite is considered an
> > inline candidate. In most cases, the decision is made locally due
> > to
> > the bottom-up order (there are tweaks to bypass it). The size/cost
> > can be remotely tied and serves a proxy to represent the real
> > runtime cost due to icache/itlb effect, but it seems the
> > size/threshold scheme is mainly used to model the runtime speedup
> > vs
> > compile time/binary size tradeoffs.
> 

> Other than the call cost itself, I've been surprised that the TTI is
> not more involved when it comes to this tradeoff: instructions don't
> have the same tradeoff depending on the platform (oh this operation
> is not legal on this type and will be expanded in multiple
> instructions in SDAG, too bad..).
I think that doing this was intended, but we've not done it yet (as we did for the throughput model used for vectorization). I think we should (I also think we should combine the cost models so that we have a single model that returns multiple kinds of costs (throughput, size, latency, etc.)). 

-Hal 

> --
> Mehdi

> > Set aside what we need longer term for the inliner, the GPU
> > specific
> > problems can be addressed by
> 
> > 1) if the call overhead is really large, define a target specific
> > getCallCost and subtract it from the initial Cost when analyzing a
> > callsite (this will help boost all targets with high call costs)
> 
> > 2) if not, but instead GPU users can tolerate large code growth,
> > then
> > it is better to this by adjusting the threshold -- perhaps have a
> > user level option -finline-limit=?
> 

> > thanks,
> 

> > David
> 

> > * some target dependent info may be used: TTI.getUserCost
> 

> > > Does that make sense to you Hal? Based on that, it would really
> > > just
> > > be a scaling factor of the inline heuristics. Unsure of how to
> > > more
> > > scientifically express this construct.
> > 
> 

> > > -Chandler
> > 
> 

> > > On Thu, Mar 10, 2016 at 3:42 PM Hal Finkel via llvm-dev <
> > > llvm-dev at lists.llvm.org > wrote:
> > 
> 

> > > > ----- Original Message -----
> > > 
> > 
> 

> > > > > From: "Artem Belevich via llvm-dev" < llvm-dev at lists.llvm.org
> > > > > >
> > > 
> > 
> 
> > > > > To: "llvm-dev" < llvm-dev at lists.llvm.org >
> > > 
> > 
> 
> > > > > Sent: Tuesday, March 1, 2016 6:31:06 PM
> > > 
> > 
> 
> > > > > Subject: [llvm-dev] [RFC] Target-specific parametrization of
> > > > > function inliner
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > Hi,
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > I propose to make function inliner parameters adjustable for
> > > > > specific
> > > 
> > 
> 
> > > > > target.
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > Currently function inlining pass appears to be
> > > > > target-agnostic
> > > > > with
> > > 
> > 
> 
> > > > > various constants for calculating call cost hardcoded. While
> > > > > it
> > > 
> > 
> 
> > > > > works reasonably well for general purpose CPUs, some quirkier
> > > 
> > 
> 
> > > > > targets like NVPTX would benefit from target-specific tuning.
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > Currently it appears that there are two things that need to
> > > > > be
> > > > > done:
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > * add Inliner preferences to TargetTransformInfo in a way
> > > > > similar
> > > > > to
> > > 
> > 
> 
> > > > > how we customize loop unrolling. Use it to provide inliner
> > > > > with
> > > 
> > 
> 
> > > > > target-specific thresholds and other parameters.
> > > 
> > 
> 
> > > > > * augment Inliner pass to use existing TargetTransformInfo
> > > > > API
> > > > > to
> > > 
> > 
> 
> > > > > figure out cost of particular call on a given target.
> > > 
> > 
> 
> > > > > TargetTransforInfo already has getCallCost(), though it does
> > > > > not
> > > 
> > 
> 
> > > > > look like anything uses it.
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > Comments? Concerns? Suggestions?
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 

> > > > Hi Art,
> > > 
> > 
> 

> > > > I've long thought that we should have a more principled way of
> > > > doing
> > > > inline profitability. There is obviously some cost to executing
> > > > a
> > > > function body, some call site overhead, and some cost reduction
> > > > associated with any post-inlining simplifications. If inlining
> > > > reduces the overall call site cost by more than some factor,
> > > > say
> > > > 1%
> > > > (this should probably depend on the optimization level), then
> > > > we
> > > > should inline. With profiling information, we might even use
> > > > global
> > > > speedup instead of local speedup.
> > > 
> > 
> 

> > > > Whether we need a target customization of this threshold, or
> > > > just
> > > > a
> > > > way for a target to supplement the fine inlining decision, is
> > > > unclear to me. It is also true that a the result of a bunch of
> > > > locally-optimal decisions might be far from the global optimum.
> > > > Maybe the target has something to say about that?
> > > 
> > 
> 

> > > > In short, I'm fine with what you're proposing, but to the
> > > > extent
> > > > possible, I want the numbers provided by the target to mean
> > > > something. Replacing a global set of somewhat-arbitrary magic
> > > > numbers, with target-specific sets of somewhat-arbitrary magic
> > > > numbers should be our last choice.
> > > 
> > 
> 

> > > > Thanks again,
> > > 
> > 
> 
> > > > Hal
> > > 
> > 
> 

> > > > >
> > > 
> > 
> 
> > > > > Thanks,
> > > 
> > 
> 
> > > > > --
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > --Artem Belevich
> > > 
> > 
> 
> > > > > _______________________________________________
> > > 
> > 
> 
> > > > > LLVM Developers mailing list
> > > 
> > 
> 
> > > > > llvm-dev at lists.llvm.org
> > > 
> > 
> 
> > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 

> > > > --
> > > 
> > 
> 
> > > > Hal Finkel
> > > 
> > 
> 
> > > > Assistant Computational Scientist
> > > 
> > 
> 
> > > > Leadership Computing Facility
> > > 
> > 
> 
> > > > Argonne National Laboratory
> > > 
> > 
> 
> > > > _______________________________________________
> > > 
> > 
> 
> > > > LLVM Developers mailing list
> > > 
> > 
> 
> > > > llvm-dev at lists.llvm.org
> > > 
> > 
> 
> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > 
> > 
> 

> > _______________________________________________
> 
> > LLVM Developers mailing list
> 
> > llvm-dev at lists.llvm.org
> 
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160401/9f420f66/attachment.html>


More information about the llvm-dev mailing list