[PATCH] D15540: [PGO] differentiate FE instrumentation and IR level instrumentation profiles

Xinliang David Li via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 15 17:17:45 PST 2015


On Tue, Dec 15, 2015 at 5:02 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
> On Tue, Dec 15, 2015 at 4:26 PM, Xinliang David Li <davidxl at google.com>
> wrote:
>>
>> On Tue, Dec 15, 2015 at 3:58 PM, Sean Silva <chisophugis at gmail.com> wrote:
>> > silvas added a comment.
>> >
>> >> llvm-profdata also handles the profiles differently: mainly for the
>> >> MaxFunctionCount. For FE profile, it only needs to find the max of the entry
>> >> count (the first count in the function). For IR level profile, the entry
>> >> count might not be available, we will set it as the maximum block count in
>> >> the profile.
>> >
>>
>> A side note -- there is a separate effort going on to compute/set
>> program level profile summary -- so the existing ones like this will
>> eventually be obsolete.
>>
>> >
>> > If I understand correctly, the motivation is that some functions might
>> > have already been inlined when IR-level instrumentation happens,
>>
>> No -- that is not the motivation. The real reason is that MST based
>> instrumentation may skip entry BB completely such that the
>> reader/writer does not know what count is for the entry -- only the
>> profile-use knows about it.
>
>
> Okay, that makes sense. Then it sounds like this is necessary. But the flag
> should not be described as "is IR level". It should be "use max BB count as
> proxy for max function count" or "entry bb may not have a counter" or
> something like that. For example, if Clang were to start using MST approach
> for its instrumentation, presumably it would need to set this flag too.
>
> As a rule of thumb, the runtime should know *nothing* about whether the
> instrumentation is IR-level or not. Both clang's instrumentation and the
> IR-level instrumentation use the common instrprof_increment intrinsic and
> common lowering, so it would be a layering violation for the runtime to know
> a difference.
>

> Features can of course be introduced, but the name of the feature should be
> named for the meaning of the feature, not the fact that currently the
> IR-level or clang instrumentation happens to do something one way or
> another.
>

The runtime serves two producers that produce profile in the same
format (with the same version) -- it needs a way to tag the profile in
some way to avoid mixed uses.   It does not (nor has the need ) to
know the difference -- but just pass the tag through.

David


> btw, I will be on PTO starting tomorrow and going until Jan 12, so please
> wait for Justin to review. (I will not be able to review patches during that
> time)
>
> -- Sean Silva
>
>>
>>
>>
>>  and so we assume that any individual basic block could have been from
>> an inlined function, so to recover that "function"'s count, we take
>> the highest basic block count. This doesn't make sense. I expect
>> MaxFunctionCount to be the maximum function count *at the point where
>> the instrumentation was done*.
>>
>> yes It is by definition except that the writer does not have the value to
>> write.
>>
>>
>> And the instrumentation (whether Clang's stuff or the IR-level stuff)
>> puts in an entry count on every function it sees at the point where
>> the instrumentation was done.
>> >
>> > An example of why this matters is that many of my customers have some
>> > extremely hot functions like `operator+` on a Vec4 class. A big advantage of
>> > the advantage of the IR-level instrumentation (especially early-inlining) is
>> > to be able to get rid of these functions before the instrumentation happens.
>>
>> yes -- it will be huge win for these cases.
>>
>> I want MaxFunctionCount to represent the maximum function count
>> *after* early inlining has cleaned up all the trivial inlinable
>> functions. Or, to look at it from a different perspective, my clients
>> use Vec4 as much as `int`; we do not want to count operator+ on Vec4
>> for the same reason we do not keep a count for the built-in operator+
>> on `int`, and we do not want MaxFunctionCount to be skewed by it.
>> Early inlining is an elegant solution. I consider the current behavior
>> a feature.
>>
>> There are a couple things to be aware of :
>>
>> 1) max-function-entry count is not a good global hotness signal for
>> the whole program;
>> 2) nor is function-entry count a good signal whether the function is
>> good for inlining or not (even though this is the only profile based
>> heuristic today). Per-callsite hotness is more important which is
>> worked on separately.
>>
>> 3) the function-entry count (after inline cleanup etc) data is
>> available with late instrumentation in profile-use stage and will be
>> written as function entry meta data
>>
>>
>> >
>> > Or to put it another way, I think that my suggestion of "simply do not
>> > instrument the top 1% hottest functions" was a potential "quick fix", but
>> > the early inlining (+ other early optimizations) is a much more elegant and
>> > complete solution. Once we have characterized the benefits of early
>> > optimizations we will be able to more clearly evaluate whether we need to
>> > avoid instrumenting the hottest functions.
>> >
>>
>> yes. Rong's early inline patch will follow up soon.
>>
>>
>> > Right now, I think we should focus on integration into clang, since that
>> > is necessary for characterizing the effects of early optimizations so that
>> > we can decide on a good set of "early" passes to run to minimize
>> > instrumentation overhead. (I am glad to help out once it has been integrated
>> > into clang)
>> >
>>
>> It will come up soon -- but the profile reader/writer change (which is
>> straight forward) needs to come in first too.
>>
>> thanks,
>>
>> David
>>
>> >
>> > http://reviews.llvm.org/D15540
>> >
>> >
>> >
>
>


More information about the llvm-commits mailing list