[PATCH] D15540: [PGO] differentiate FE instrumentation and IR level instrumentation profiles

Xinliang David Li via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 15 16:26:25 PST 2015


On Tue, Dec 15, 2015 at 3:58 PM, Sean Silva <chisophugis at gmail.com> wrote:
> silvas added a comment.
>
>> llvm-profdata also handles the profiles differently: mainly for the MaxFunctionCount. For FE profile, it only needs to find the max of the entry count (the first count in the function). For IR level profile, the entry count might not be available, we will set it as the maximum block count in the profile.
>

A side note -- there is a separate effort going on to compute/set
program level profile summary -- so the existing ones like this will
eventually be obsolete.

>
> If I understand correctly, the motivation is that some functions might have already been inlined when IR-level instrumentation happens,

No -- that is not the motivation. The real reason is that MST based
instrumentation may skip entry BB completely such that the
reader/writer does not know what count is for the entry -- only the
profile-use knows about it.


 and so we assume that any individual basic block could have been from
an inlined function, so to recover that "function"'s count, we take
the highest basic block count. This doesn't make sense. I expect
MaxFunctionCount to be the maximum function count *at the point where
the instrumentation was done*.

yes It is by definition except that the writer does not have the value to write.


And the instrumentation (whether Clang's stuff or the IR-level stuff)
puts in an entry count on every function it sees at the point where
the instrumentation was done.
>
> An example of why this matters is that many of my customers have some extremely hot functions like `operator+` on a Vec4 class. A big advantage of the advantage of the IR-level instrumentation (especially early-inlining) is to be able to get rid of these functions before the instrumentation happens.

yes -- it will be huge win for these cases.

I want MaxFunctionCount to represent the maximum function count
*after* early inlining has cleaned up all the trivial inlinable
functions. Or, to look at it from a different perspective, my clients
use Vec4 as much as `int`; we do not want to count operator+ on Vec4
for the same reason we do not keep a count for the built-in operator+
on `int`, and we do not want MaxFunctionCount to be skewed by it.
Early inlining is an elegant solution. I consider the current behavior
a feature.

There are a couple things to be aware of :

1) max-function-entry count is not a good global hotness signal for
the whole program;
2) nor is function-entry count a good signal whether the function is
good for inlining or not (even though this is the only profile based
heuristic today). Per-callsite hotness is more important which is
worked on separately.

3) the function-entry count (after inline cleanup etc) data is
available with late instrumentation in profile-use stage and will be
written as function entry meta data


>
> Or to put it another way, I think that my suggestion of "simply do not instrument the top 1% hottest functions" was a potential "quick fix", but the early inlining (+ other early optimizations) is a much more elegant and complete solution. Once we have characterized the benefits of early optimizations we will be able to more clearly evaluate whether we need to avoid instrumenting the hottest functions.
>

yes. Rong's early inline patch will follow up soon.


> Right now, I think we should focus on integration into clang, since that is necessary for characterizing the effects of early optimizations so that we can decide on a good set of "early" passes to run to minimize instrumentation overhead. (I am glad to help out once it has been integrated into clang)
>

It will come up soon -- but the profile reader/writer change (which is
straight forward) needs to come in first too.

thanks,

David

>
> http://reviews.llvm.org/D15540
>
>
>


More information about the llvm-commits mailing list