[PATCH] D15540: [PGO] differentiate FE instrumentation and IR level instrumentation profiles

Sean Silva via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 15 15:58:09 PST 2015


silvas added a comment.

> llvm-profdata also handles the profiles differently: mainly for the MaxFunctionCount. For FE profile, it only needs to find the max of the entry count (the first count in the function). For IR level profile, the entry count might not be available, we will set it as the maximum block count in the profile.


If I understand correctly, the motivation is that some functions might have already been inlined when IR-level instrumentation happens, and so we assume that any individual basic block could have been from an inlined function, so to recover that "function"'s count, we take the highest basic block count. This doesn't make sense. I expect MaxFunctionCount to be the maximum function count *at the point where the instrumentation was done*. And the instrumentation (whether Clang's stuff or the IR-level stuff) puts in an entry count on every function it sees at the point where the instrumentation was done.

An example of why this matters is that many of my customers have some extremely hot functions like `operator+` on a Vec4 class. A big advantage of the advantage of the IR-level instrumentation (especially early-inlining) is to be able to get rid of these functions before the instrumentation happens. I want MaxFunctionCount to represent the maximum function count *after* early inlining has cleaned up all the trivial inlinable functions. Or, to look at it from a different perspective, my clients use Vec4 as much as `int`; we do not want to count operator+ on Vec4 for the same reason we do not keep a count for the built-in operator+ on `int`, and we do not want MaxFunctionCount to be skewed by it. Early inlining is an elegant solution. I consider the current behavior a feature.

Or to put it another way, I think that my suggestion of "simply do not instrument the top 1% hottest functions" was a potential "quick fix", but the early inlining (+ other early optimizations) is a much more elegant and complete solution. Once we have characterized the benefits of early optimizations we will be able to more clearly evaluate whether we need to avoid instrumenting the hottest functions.

Right now, I think we should focus on integration into clang, since that is necessary for characterizing the effects of early optimizations so that we can decide on a good set of "early" passes to run to minimize instrumentation overhead. (I am glad to help out once it has been integrated into clang)


http://reviews.llvm.org/D15540





More information about the llvm-commits mailing list