[LLVMdev] RFC - Improvements to PGO profile support

Wed Feb 25 12:40:26 PST 2015

On Wed, Feb 25, 2015 at 10:52 AM, Philip Reames
<listmail at philipreames.com> wrote:
> On 02/24/2015 03:31 PM, Diego Novillo wrote:
>
>
> We (Google) have started to look more closely at the profiling
> infrastructure in LLVM. Internally, we have a large dependency on PGO to get
> peak performance in generated code.
>
> Some of the dependencies we have on profiling are still not present in LLVM
> (e.g., the inliner) but we will still need to incorporate changes to support
> our work on these optimizations. Some of the changes may be addressed as
> individual bug fixes on the existing profiling infrastructure. Other changes
> may be better implemented as either new extensions or as replacements of
> existing code.
>
> I think we will try to minimize infrastructure replacement at least in the
> short/medium term. After all, it doesn't make too much sense to replace
> infrastructure that is broken for code that doesn't exist yet.
>
> David Li and I are preparing a document where we describe the major issues
> that we'd like to address. The document is a bit on the lengthy side, so it
> may be easier to start with an email discussion.
>
> I would personally be interested in seeing a copy of that document, but it
> might be more appropriate for a blog post then a discussion on llvm-dev.  I
> worry that we'd end up with a very unfocused discussion.  It might be better
> to frame this as your plan of attack and reserve discussion on llvm-dev for
> things that are being proposed semi near term.  Just my 2 cents.
>
> This is a summary of the main changes we are looking at:
>
> Need to faithfully represent the execution count taken from dynamic
> profiles. Currently, MD_prof does not really represent an execution count.
> This makes things like comparing hotness across functions hard or
> impossible. We need a concept of global hotness.
>
> What does MD_prof actually represent when used from Clang?  I know I've been
> using it for execution counters in my frontend.  Am I approaching that
> wrong?
>
> As a side comment: I'm a bit leery of the notion of a consistent notion of
> hotness based on counters across functions.  These counters are almost
> always approximate in practice and counting problems run rampant.

Having representative training runs is pre-requisite for using FDO/PGO.

>  I'd
> almost rather see a consistent count inferred from data that's assumed to be
> questionable than
>make the frontend try to generate consistent profiling
> metadata.

Frontend does not generate profile data -- it is just a messenger that
should pass the data faithfully to the middle end. That messenger
(profile reader) can be in middle end too.

>  I think either approach could be made to work, we just need to
> think about it carefully.
>
> When the CFG or callgraph change, there need to exist an API for
> incrementally updating/scaling counts. For instance, when a function is
> inlined or partially inlined, when the CFG is modified, etc. These counts
> need to be updated incrementally (or perhaps re-computed as a first step
> into that direction).
>
> Agreed.  Do you have a sense how much of an issue this in practice?  I
> haven't see it kick in much, but it's also not something I've been looking
> for.
>
> The inliner (and other optimizations) needs to use profile information and
> update it accordingly. This is predicated on Chandler's work on the pass
> manager, of course.
>
> Its worth noting that the inliner work can be done independently of the pass
> manager work.  We can always explicitly recompute relevant analysis in the
> inliner if needed.  This will cost compile time, so we might need to make
> this an off by default option.  (Maybe -O3 only?)  Being able to work on the
> inliner independently of the pass management structure is valuable enough
> that we should probably consider doing this.
>
> PGO inlining is an area I'm very interested in.  I'd really encourage you to
> work incrementally in tree.  I'm likely to start putting non-trivial amounts
> of time into this topic in the next few weeks.  I just need to clear a few
> things off my plate first.
>
> Other than the inliner, can you list the passes you think are profitable to
> teach about profiling data?  My list so far is: PRE (particularly of
> loads!), the vectorizer (i.e. duplicate work down both a hot and cold path
> when it can be vectorized on the hot path), LoopUnswitch, IRCE, & LoopUnroll
> (avoiding code size explosion in cold code).  I'm much more interested in
> sources of improved performance than I am simply code size reduction.
> (Reducing code size can improve performance of course.)

PGO is very effective in code size reduction. In reality, large
percentage of functions are globally cold.

David
>
> Need to represent global profile summary data. For example, for global
> hotness determination, it is useful to compute additional global summary
> info, such as a histogram of counts that can be used to determine hotness
> and working set size estimates for a large percentage of the profiled
> execution.
>
> Er, not clear what you're trying to say here?
>
> There are other changes that we will need to incorporate. David, Teresa,
> Chandler, please add anything large that I missed.
>
> My main question at the moment is what would be the best way of addressing
> them. Some seem to require new concepts to be implemented (e.g., execution
> counts). Others could be addressed as simple bugs to be fixed in the current
> framework.
>
> Would it make sense to present everything in a unified document and discuss
> that? I've got some reservations about that approach because we will end up
> discussing everything at once and it may not lead to concrete progress.
> Another approach would be to present each issue individually either as
> patches or RFCs or bugs.
>
> See above.
>
>
> I will be taking on the implementation of several of these issues. Some of
> them involve the SamplePGO harness that I added last year. I would also like
> to know what other bugs or problems people have in mind that I could also
> roll into this work.
>
>
> Thanks. Diego.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>