[llvm-dev] RFC: Synthetic function entry counts
Sean Silva via llvm-dev
llvm-dev at lists.llvm.org
Fri Dec 15 00:34:26 PST 2017
On Dec 12, 2017 5:02 PM, "Easwaran Raman via llvm-dev" <
llvm-dev at lists.llvm.org> wrote:
Functions in LLVM IR have a function_entry_count metadata that is attached
in PGO compilation. By using the entry count together with the block
frequency info, the compiler computes the profile count of call
instructions based on which the hotness/coldness of callsites can be
determined. Experiments have shown that using a higher threshold for hot
callsites results in improved runtime performance of the generated code
without significant code size increases. We propose to generate synthetic
function counts for non-PGO compilation and use the counts for boosting hot
callsites during inlining.
Synthetic function entry counts of functions are initialized based on
properties of the function such as whether it is visible outside the
module, whether it has an inline keyword and so on. Then, the callgraph SCC
is traversed in reverse post-order. Counts of callsites are determined
based on the entry count and the block frequency of the callsite. The
callsite count gets added to the entry count of the callee. For targets of
indirect calls, we will use the !callees metadata to find the possible
targets and distribute the count equally among them. For functions in a
non-trivial SCC, the algorithm has to ensure that the counts are stable and
deterministic.
Does this have any interaction with mayBeDerefined? E.g. different results
of static profile in different compilation units causing problems. I think
it should be okay since the transformations that look at the PGO data I
think are the ones responsible for being safe when !mayBeDerefined, but I
want to get a second opinion.
-- Sean Silva
In ThinLTO mode, the function summary contains the list of call edges from
the function. We propose to add the relative block frequency on these
edges. During the thinlink phase, we propagate the function counts on the
entire call graph and update the function summary with the synthetic
counts. Additionally, we plan to use the computed counts to drive the
importing decisions.
Alternative approach
-----------------------------
An alternative to generating synthetic counts is to make block frequency
info an inter-procedural analysis. Such an analysis would allow comparing
BFI of callsites in two different functions. This has several downsides:
-
The inter-procedural BFI computation is likely to be more expensive in
terms of compile-time.
-
Many function passes invalidate the BFI. This will require selective
invalidation of function BFIs.
-
Inliner correctly updates function counts of a callee after a callsite
is inlined. We can piggyback on this mechanism by using synthetic counts.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171215/1dd0a5ac/attachment-0001.html>
More information about the llvm-dev
mailing list