[LLVMdev] RFC - Improvements to PGO profile support

Philip Reames listmail at philipreames.com
Wed Feb 25 10:52:08 PST 2015


On 02/24/2015 03:31 PM, Diego Novillo wrote:
>
> We (Google) have started to look more closely at the profiling 
> infrastructure in LLVM. Internally, we have a large dependency on PGO 
> to get peak performance in generated code.
>
> Some of the dependencies we have on profiling are still not present in 
> LLVM (e.g., the inliner) but we will still need to incorporate changes 
> to support our work on these optimizations. Some of the changes may be 
> addressed as individual bug fixes on the existing profiling 
> infrastructure. Other changes  may be better implemented as either new 
> extensions or as replacements of existing code.
>
> I think we will try to minimize infrastructure replacement at least in 
> the short/medium term. After all, it doesn't make too much sense to 
> replace infrastructure that is broken for code that doesn't exist yet.
>
> David Li and I are preparing a document where we describe the major 
> issues that we'd like to address. The document is a bit on the lengthy 
> side, so it may be easier to start with an email discussion.
I would personally be interested in seeing a copy of that document, but 
it might be more appropriate for a blog post then a discussion on 
llvm-dev.  I worry that we'd end up with a very unfocused discussion.  
It might be better to frame this as your plan of attack and reserve 
discussion on llvm-dev for things that are being proposed semi near 
term.  Just my 2 cents.

> This is a summary of the main changes we are looking at:
>
>  1. Need to faithfully represent the execution count taken from
>     dynamic profiles. Currently, MD_prof does not really represent an
>     execution count. This makes things like comparing hotness across
>     functions hard or impossible. We need a concept of global hotness.
>
What does MD_prof actually represent when used from Clang?  I know I've 
been using it for execution counters in my frontend.  Am I approaching 
that wrong?

As a side comment: I'm a bit leery of the notion of a consistent notion 
of hotness based on counters across functions.  These counters are 
almost always approximate in practice and counting problems run 
rampant.  I'd almost rather see a consistent count inferred from data 
that's assumed to be questionable than make the frontend try to generate 
consistent profiling metadata.  I think either approach could be made to 
work, we just need to think about it carefully.
>
>  1. When the CFG or callgraph change, there need to exist an API for
>     incrementally updating/scaling counts. For instance, when a
>     function is inlined or partially inlined, when the CFG is
>     modified, etc. These counts need to be updated incrementally (or
>     perhaps re-computed as a first step into that direction).
>
Agreed.  Do you have a sense how much of an issue this in practice? I 
haven't see it kick in much, but it's also not something I've been 
looking for.
>
>  1. The inliner (and other optimizations) needs to use profile
>     information and update it accordingly. This is predicated on
>     Chandler's work on the pass manager, of course.
>
Its worth noting that the inliner work can be done independently of the 
pass manager work.  We can always explicitly recompute relevant analysis 
in the inliner if needed.  This will cost compile time, so we might need 
to make this an off by default option.  (Maybe -O3 only?)  Being able to 
work on the inliner independently of the pass management structure is 
valuable enough that we should probably consider doing this.

PGO inlining is an area I'm very interested in.  I'd really encourage 
you to work incrementally in tree.  I'm likely to start putting 
non-trivial amounts of time into this topic in the next few weeks.  I 
just need to clear a few things off my plate first.

Other than the inliner, can you list the passes you think are profitable 
to teach about profiling data?  My list so far is: PRE (particularly of 
loads!), the vectorizer (i.e. duplicate work down both a hot and cold 
path when it can be vectorized on the hot path), LoopUnswitch, IRCE, & 
LoopUnroll (avoiding code size explosion in cold code).  I'm much more 
interested in sources of improved performance than I am simply code size 
reduction.  (Reducing code size can improve performance of course.)
>
>  1. Need to represent global profile summary data. For example, for
>     global hotness determination, it is useful to compute additional
>     global summary info, such as a histogram of counts that can be
>     used to determine hotness and working set size estimates for a
>     large percentage of the profiled execution.
>
Er, not clear what you're trying to say here?
> There are other changes that we will need to incorporate. David, 
> Teresa, Chandler, please add anything large that I missed.
>
> My main question at the moment is what would be the best way of 
> addressing them. Some seem to require new concepts to be implemented 
> (e.g., execution counts). Others could be addressed as simple bugs to 
> be fixed in the current framework.
>
> Would it make sense to present everything in a unified document and 
> discuss that? I've got some reservations about that approach because 
> we will end up discussing everything at once and it may not lead to 
> concrete progress. Another approach would be to present each issue 
> individually either as patches or RFCs or bugs.
See above.
>
> I will be taking on the implementation of several of these issues. 
> Some of them involve the SamplePGO harness that I added last year. I 
> would also like to know what other bugs or problems people have in 
> mind that I could also roll into this work.
>
>
> Thanks. Diego.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150225/d5c8afaa/attachment.html>


More information about the llvm-dev mailing list