[llvm-dev] RFC: Synthetic function entry counts
Xinliang David Li via llvm-dev
llvm-dev at lists.llvm.org
Thu Dec 14 13:44:32 PST 2017
On Thu, Dec 14, 2017 at 1:03 PM, Philip Reames <listmail at philipreames.com>
> This really sounds like it should be an analysis pass. Summarizing the
> results into the IR using the existing entry_county attributes sounds like
> a workaround for an invalidation problem more than anything else.
I think it is not a workaround. The strength of the proposal is that it
fits really naturally with the PGO infrastructure -- all the existing
profile update mechanism (currently mainly inliner) can be reused. Most of
the PGO related query interfaces can also be reused so that there is very
little change from the client side.
The inter-procedural frequency propagation also needs to happen in the
thinLink time for thinLTO which should not touch IR except summary data.
You acknowledge this in your alternatives discussion, but I don't follow
> why BFI invalidation at the function level has to invalidate the inter
> procedural result. You'd certainly need to use BFI when *running* your
> analysis, but once computed the results should be stable across updates
> within the function. You might need update logic for inlining, outlining,
> and mergefunc, but that should be it in terms of current LLVM transforms?
> On 12/12/2017 05:02 PM, Easwaran Raman via llvm-dev wrote:
> Functions in LLVM IR have a function_entry_count metadata that is attached
> in PGO compilation. By using the entry count together with the block
> frequency info, the compiler computes the profile count of call
> instructions based on which the hotness/coldness of callsites can be
> determined. Experiments have shown that using a higher threshold for hot
> callsites results in improved runtime performance of the generated code
> without significant code size increases. We propose to generate synthetic
> function counts for non-PGO compilation and use the counts for boosting hot
> callsites during inlining.
> Synthetic function entry counts of functions are initialized based on
> properties of the function such as whether it is visible outside the
> module, whether it has an inline keyword and so on. Then, the callgraph SCC
> is traversed in reverse post-order. Counts of callsites are determined
> based on the entry count and the block frequency of the callsite. The
> callsite count gets added to the entry count of the callee. For targets of
> indirect calls, we will use the !callees metadata to find the possible
> targets and distribute the count equally among them. For functions in a
> non-trivial SCC, the algorithm has to ensure that the counts are stable and
> In ThinLTO mode, the function summary contains the list of call edges from
> the function. We propose to add the relative block frequency on these
> edges. During the thinlink phase, we propagate the function counts on the
> entire call graph and update the function summary with the synthetic
> counts. Additionally, we plan to use the computed counts to drive the
> importing decisions.
> Alternative approach
> An alternative to generating synthetic counts is to make block frequency
> info an inter-procedural analysis. Such an analysis would allow comparing
> BFI of callsites in two different functions. This has several downsides:
> The inter-procedural BFI computation is likely to be more expensive in
> terms of compile-time.
> Many function passes invalidate the BFI. This will require selective
> invalidation of function BFIs.
> Inliner correctly updates function counts of a callee after a callsite
> is inlined. We can piggyback on this mechanism by using synthetic counts.
> LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev