[PATCH] D15540: [PGO] differentiate FE instrumentation and IR level instrumentation profiles

Sean Silva via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 15 17:02:45 PST 2015


On Tue, Dec 15, 2015 at 4:26 PM, Xinliang David Li <davidxl at google.com>
wrote:

> On Tue, Dec 15, 2015 at 3:58 PM, Sean Silva <chisophugis at gmail.com> wrote:
> > silvas added a comment.
> >
> >> llvm-profdata also handles the profiles differently: mainly for the
> MaxFunctionCount. For FE profile, it only needs to find the max of the
> entry count (the first count in the function). For IR level profile, the
> entry count might not be available, we will set it as the maximum block
> count in the profile.
> >
>
> A side note -- there is a separate effort going on to compute/set
> program level profile summary -- so the existing ones like this will
> eventually be obsolete.
>
> >
> > If I understand correctly, the motivation is that some functions might
> have already been inlined when IR-level instrumentation happens,
>
> No -- that is not the motivation. The real reason is that MST based
> instrumentation may skip entry BB completely such that the
> reader/writer does not know what count is for the entry -- only the
> profile-use knows about it.
>

Okay, that makes sense. Then it sounds like this is necessary. But the flag
should not be described as "is IR level". It should be "use max BB count as
proxy for max function count" or "entry bb may not have a counter" or
something like that. For example, if Clang were to start using MST approach
for its instrumentation, presumably it would need to set this flag too.

As a rule of thumb, the runtime should know *nothing* about whether the
instrumentation is IR-level or not. Both clang's instrumentation and the
IR-level instrumentation use the common instrprof_increment intrinsic and
common lowering, so it would be a layering violation for the runtime to
know a difference.

Features can of course be introduced, but the name of the feature should be
named for the meaning of the feature, not the fact that currently the
IR-level or clang instrumentation happens to do something one way or
another.

btw, I will be on PTO starting tomorrow and going until Jan 12, so please
wait for Justin to review. (I will not be able to review patches during
that time)

-- Sean Silva


>
>
>  and so we assume that any individual basic block could have been from
> an inlined function, so to recover that "function"'s count, we take
> the highest basic block count. This doesn't make sense. I expect
> MaxFunctionCount to be the maximum function count *at the point where
> the instrumentation was done*.
>
> yes It is by definition except that the writer does not have the value to
> write.
>
>
> And the instrumentation (whether Clang's stuff or the IR-level stuff)
> puts in an entry count on every function it sees at the point where
> the instrumentation was done.
> >
> > An example of why this matters is that many of my customers have some
> extremely hot functions like `operator+` on a Vec4 class. A big advantage
> of the advantage of the IR-level instrumentation (especially
> early-inlining) is to be able to get rid of these functions before the
> instrumentation happens.
>
> yes -- it will be huge win for these cases.
>
> I want MaxFunctionCount to represent the maximum function count
> *after* early inlining has cleaned up all the trivial inlinable
> functions. Or, to look at it from a different perspective, my clients
> use Vec4 as much as `int`; we do not want to count operator+ on Vec4
> for the same reason we do not keep a count for the built-in operator+
> on `int`, and we do not want MaxFunctionCount to be skewed by it.
> Early inlining is an elegant solution. I consider the current behavior
> a feature.
>
> There are a couple things to be aware of :
>
> 1) max-function-entry count is not a good global hotness signal for
> the whole program;
> 2) nor is function-entry count a good signal whether the function is
> good for inlining or not (even though this is the only profile based
> heuristic today). Per-callsite hotness is more important which is
> worked on separately.
>
> 3) the function-entry count (after inline cleanup etc) data is
> available with late instrumentation in profile-use stage and will be
> written as function entry meta data
>
>
> >
> > Or to put it another way, I think that my suggestion of "simply do not
> instrument the top 1% hottest functions" was a potential "quick fix", but
> the early inlining (+ other early optimizations) is a much more elegant and
> complete solution. Once we have characterized the benefits of early
> optimizations we will be able to more clearly evaluate whether we need to
> avoid instrumenting the hottest functions.
> >
>
> yes. Rong's early inline patch will follow up soon.
>
>
> > Right now, I think we should focus on integration into clang, since that
> is necessary for characterizing the effects of early optimizations so that
> we can decide on a good set of "early" passes to run to minimize
> instrumentation overhead. (I am glad to help out once it has been
> integrated into clang)
> >
>
> It will come up soon -- but the profile reader/writer change (which is
> straight forward) needs to come in first too.
>
> thanks,
>
> David
>
> >
> > http://reviews.llvm.org/D15540
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151215/34c30225/attachment.html>


More information about the llvm-commits mailing list