[LLVMdev] RFC - Improvements to PGO profile support

Fri Mar 6 19:00:58 PST 2015

Bob, Philip, thanks for the feedback.

Diego is planning to give more detailed reply next Monday. There seem
to be some misunderstanding about the proposals, so I will just give
some highlights here:

1) The proposal is not intending to fundamentally change the current
framework, but to enhanced the framework so that
  a) more profile information is preserved
  b) block/edge count/frequency becomes faster to compute
  b) profile information becomes faster to access and update
(inter-procedurally)

2) Changes to profile APIs and profile client code will be minimized,
except that we will add IPA clients (once Chandler's pass manager
change is ready)

3) The proposed change does *not* give up relative hotness as
mentioned by Philiip. All clients that relies on relative hotness are
not affected -- except that the data is better and more reliable

4) With real profile data available, current infrastructure does *not*
provide reasonable hotness (e.g., you can try comparing the BBs that
execute the same number times, but in loops with different depths in
the same function and see how big the diff is), let alone fast
updating.

I am reasonably confident that the proposal
1) does not affect compilations using static profile (with branch prediction)
2) strictly better for -fprofile-instr-use optimizations.

The area I am not so sure is the JIT, but I am really interested in
knowing the details and propose solutions for you if the current
proposal does not work for you (which I doubt -- because if the
current framework works, the new one should work too :) ).

I am looking forward to more detailed discussions next week! We shall
sit down together and discuss changes, rationale, concerns one by one
-- with concrete examples.

thanks,

David

On Fri, Mar 6, 2015 at 5:49 PM, Philip Reames <listmail at philipreames.com> wrote:
>
> On 03/05/2015 08:29 AM, Bob Wilson wrote:
>
>
> On Mar 2, 2015, at 4:19 PM, Diego Novillo <dnovillo at google.com> wrote:
>
> On Thu, Feb 26, 2015 at 6:54 PM, Diego Novillo <dnovillo at google.com> wrote:
>
>> I've created a few bugzilla issues with details of some of the things I'll
>> be looking into. I'm not yet done wordsmithing the overall design document.
>> I'll try to finish it by early next week at the latest.
>
>
> The document is available at
>
> https://docs.google.com/document/d/15VNiD-TmHqqao_8P-ArIsWj1KdtU-ElLFaYPmZdrDMI/edit?usp=sharing
>
> There are several topics covered. Ideally, I would prefer that we discuss
> each topic separately. The main ones I will start working on are the ones
> described in the bugzilla links we have in the doc.
>
> This is just a starting point for us. I am not at all concerned with
> implementing exactly what is proposed in the document. In fact, if we can
> get the same value using the existing support, all the better.
>
> OTOH, any other ideas that folks may have that work better than this are
> more than welcome. I don't have really strong opinions on the matter. I am
> fine with whatever works.
>
>
> Thanks for the detailed write-up on this. Some of the issues definitely need
> to be addressed. I am concerned, though, that some of the ideas may be
> leading toward a scenario where we have essentially two completely different
> ways of representing profile information in LLVM IR. It is great to have two
> complementary approaches to collecting profile data, but two representations
> in the IR would not make sense.
>
> The first issue raised is that profile execution counts are not represented
> in the IR. This was a very intentional decision. I know it goes against what
> other compilers have done in the past. It took me a while to get used to the
> idea when Andy first suggested it, so I know it seems awkward at first. The
> advantage is that branch probabilities are much easier to keep updated in
> the face of compiler transformations, compared to execution counts. We are
> definitely missing the per-function execution counts that are needed to be
> able to compare relative “hotness” across functions, and I think that would
> be a good place to start making improvements. In the long term, we should
> keep our options open to making major changes, but before we go there, we
> should try to make incremental improvements to fix the existing
> infrastructure.
>
> Many of the other issues you raise seem like they could also be addressed
> without major changes to the existing infrastructure. Let’s try to fix those
> first.
>
> After reading the document, I agree with Bob's perspective here.
>
> I would strongly recommend that you start with the optimizations than can be
> implemented within the current framework.  The current infrastructure gives
> a fairly reasonable idea of relative hotness within a function.  There's a
> lot to be done to exploit that information (even in the inliner!) without
> resorting to cross function analysis.  If, after most of those have been
> implemented, we need more fundamental changes we could consider them.
> Starting with a fundamental rewrite of the profiling system within LLVM
> seems like a mistake.
>
> At a meta level, as someone who uses LLVM for JITing I would be opposed to a
> system that assumed consistent profiling counts across function boundaries
> and gave up on relative hotness information.  At least if I'm understanding
> your proposal, this would *completely break* a multi-tiered JIT.  In
> practice, you generally stop collecting instrumentation profiling once
> something is compiled at a high enough tier.  When compiling it's caller,
> you'll get very deceptive results if you rely on the execution counts to
> line up across functions.  On the other hand, merging two relative hotness
> profiles by scaling based on the hotness of the callsite works out quite
> well in practice.  You can use some information about global hotness to make
> decisions, but those decisions need to be resilient to such systematic
> under-counting.
>
> Philip