[llvm-dev] The state of IRPGO (3 remaining work items)

Sean Silva via llvm-dev llvm-dev at lists.llvm.org
Thu Jun 2 17:30:45 PDT 2016


On Thu, Jun 2, 2016 at 2:51 PM, Frédéric Riss <friss at apple.com> wrote:

>
> On Jun 2, 2016, at 12:10 AM, Sean Silva <chisophugis at gmail.com> wrote:
>
>
>
> On Wed, Jun 1, 2016 at 5:46 PM, Frédéric Riss <friss at apple.com> wrote:
>
>>
>> On Jun 1, 2016, at 1:46 PM, Sean Silva <chisophugis at gmail.com> wrote:
>>
>>
>>
>> On Tue, May 31, 2016 at 6:02 PM, Frédéric Riss <friss at apple.com> wrote:
>>
>>>
>>> On May 24, 2016, at 5:21 PM, Sean Silva <chisophugis at gmail.com> wrote:
>>>
>>>
>>>
>>> On Tue, May 24, 2016 at 3:41 PM, Vedant Kumar <vsk at apple.com> wrote:
>>>
>>>>
>>>> > On May 23, 2016, at 8:56 PM, Xinliang David Li <davidxl at google.com>
>>>> wrote:
>>>> >
>>>> > On Mon, May 23, 2016 at 8:23 PM, Sean Silva <chisophugis at gmail.com>
>>>> wrote:
>>>> > Jake and I have been integrating IRPGO on PS4, and we've identified 3
>>>> remaining work items.
>>>> >
>>>> > Sean, thanks for the write up. It matches very well with what we
>>>> think as well.
>>>>
>>>> + 1
>>>>
>>>>
>>>> > - Driver changes
>>>> >
>>>> > We'd like to make IRPGO the default on PS4. We also think that it
>>>> would be beneficial to make IRPGO the default PGO on all platforms
>>>> (coverage would continue to use FE instr as it does currently, of course).
>>>> In previous conversations (e.g. http://reviews.llvm.org/D15829) it has
>>>> come up that Apple have requirements that would prevent them from moving to
>>>> IRPGO as the default PGO, at least without a deprecation period of one or
>>>> two releases.
>>>>
>>>> Sean pointed out the problematic scenario in D15829 (in plan "C"):
>>>>
>>>> ```
>>>> All existing user workflows continue to work, except for workflows that
>>>> attempt to llvm-profdata merge some old frontend profile data (e.g. they
>>>> have checked-in to version control and represents some special workload)
>>>> with the profile data from new binaries.
>>>> ```
>>>>
>>>> We can address this issue by (1) making sure llvm-profdata emits a
>>>> helpful warning when merging an FE-based profile with an IR-based one, and
>>>> (2) keeping an option to use FE instrumentation for PGO. Having (2) helps
>>>> people who can't (or don't want) to switch to IR PGO.
>>>>
>>>>
>>>> > I'd like to get consensus on a path forward.
>>>> > As a point of discussion, how about we make IRPGO the default on all
>>>> platforms except Apple platforms.
>>>>
>>>> I'd really rather not introduce this inconsistency. I'm worried that it
>>>> might lead to Darwin becoming a second-tier platform for PGO.
>>>>
>>>> Fred (CC'd) is following up with some of our internal users to check if
>>>> we can change the default behavior of -fprofile-instr-generate. He should
>>>> be able to chime in on this soon.
>>>>
>>>
>>> Sorry it took me so long.
>>>
>>
>> Hi Fred,
>>
>> My understanding is that you were specifically investigating whether
>> Apple needed compatibility for merging indexed profiles. Is that
>> compatibility needed? The only compelling argument I have heard to continue
>> to expose FEPGO is that Apple may have a compatibility requirement for
>> merging indexed profiles from previous compiler versions.
>>
>>
>> Sorry no, my comment had nothing to do with merging profiles. I
>> understand that this will break, and it might very well be an issue for us,
>> but I think there is a more fundamental issue with the proposed plan. As
>> you bring it up though, this is a user visible breakage that shouldn’t be
>> disregarded completely.
>>
>
> Merging with existing indexed profiles is the only user-visible breakage
> AFAIK (this was discussed at length in http://reviews.llvm.org/D15829 and
> the corresponding email thread). Please provide concrete examples where
> things would break.
>
>
> I feel like we are talking past each other. It is a fact that “merging
> existing indexed profiles” will break. You consider it to be a reasonable
> breakage, I’m just saying that we shouldn't ignore this completely. It is a
> known breakage so it should be considered when making the decision about
> the new flags’ behavior.
> We advise users to commit their profiles to VCS, so this could actually be
> a real issue for them (Xcode won’t try to merge old/new profiles by itself
> though, but users could do this). For those kind of changes where the
> alternative is not a full drop-in replacement, I think deprecating the
> option and replacing it with a new one is a better methodology.
>
>
>
>>
>> Even if this is a requirement, then I still intend to make IRPGO the
>> default and only PGO going forward, at least on PS4. I think that doing the
>> same for all platforms in the upstream compiler probably makes sense as
>> well, since an internal Apple vendor compatibility requirement should not
>> penalize all users of the open source project.
>>
>>
>> Again, I’m not expressing an Apple requirement, just trying to discuss
>> the specifics of the proposed implementation. My goal is not to hinder
>> anything, and I want our platforms to be able to use IRPGO reliably if
>> users see the need for it.
>>
>
> What I'm saying is that besides reduced training overhead (and the
> inability to merge with older indexed profiles, which AFAIK is the only
> actual potential requirement that would need a deprecation period for
> FEPGO), IRPGO is basically "just a better PGO", so adding a frontend one
> (except as something purely during a deprecation period) is
> pointless. "just a better PGO" is what IRPGO is for my users. I don't want
> to have to have them deal with (and I don't want to support) FEPGO.
>
>
> Well we want to support FEPGO because it fits Xcode's workflow better
> (mainly it allows to do PGO + coverage at the same time).
>
> Anything that will cause the existing flag to continue to produce FEPGO on
> PS4 is not something that I'm really okay with. The reduced overhead of
> IRPGO is really important on PS4 (i.e. the difference between the
> instrumented game being playable or not). I really don't want to have to
> test the triple to determine the meaning of `-fprofile-instr-generate`
> (without `-fcoverage-mapping`).
>
>
> And I agree that IRPGO is better for your users, but the best workflow for
> your users is not necessarily the best workflow for every clang user. I
> totally agree with you that testing the triple to decide the meaning of the
> option is wrong to do upstream. The best default is not a platform choice,
> it’s a workflow decision.
>
>
>> I’ve discussed the change in behavior quiet extensively, and I after
>>> having changed my mind a couple times, I would argue in favor of keeping
>>> the current behavior for the existing flags. I think adding a new switch
>>> for IRPGO is a better option. The argument that weighted most on my opinion
>>> is the proposed interaction with -fcoverage-mapping, and it is not at all
>>> platform specific. With the proposed new behavior, turning coverage on and
>>> off in your build system will generate a binary with different performance
>>> characteristics and this feels really wrong.
>>>
>>
>> Bob already mentioned in the other thread that `-fprofile-instr-generate
>> -fcoverage-mapping` was sufficiently different from
>> `-fprofile-instr-generate` that `-fprofile-instr-generate
>> -fcoverage-mapping` was not an acceptable workaround that could be used for
>> enabling FEPGO during a transitionary period, so I'm not convinced that
>> your argument here makes sense.
>>
>>
>> I’m not sure what you’re referring to here, and I have a hard time
>> parsing the sentence. I suppose “was not an acceptable” should read “was an
>> acceptable”? I would be surprised that Bob ever agreed to completely
>> transition away from FEPGO. I didn’t even understand that getting rid of
>> FEPGO was on table as you seem to imply bellow.
>>
>
> No, it is written as intended. The backstory is in
> http://reviews.llvm.org/D15829 (and the corresponding email thread). The
> paragraph starting with "The coverage mapping adds considerable cost.”.
>
>
> Thanks for the pointer, it turns out this part of the thread never made it
> to my mailbox. What Bob is saying is that we need an option to turn on
> FEPGO without coverage, and this is basically exactly what I think we
> should do: add an explicit option to choose the instrumentation type, so
> that we can use FEPGO even without coverage in the new IRPGO world.
>
>
>> I also share David's opinion that this is not going to be an issue in
>> practice. I think it makes sense for PGO and coverage to have different
>> overheads. Coverage inherently has to trace all locations at source level,
>> while PGO has more freedom.
>>
>>
>> I’m sorry if I wasn’t clear, but I’m not talking about instrumentation
>> overhead, I’m talking about the performance of the binary generated using
>> the profiles. If we go the route of making the meaning of
>> -fprofile-instr-generate depend on whether -fcoverage-mapping gets passed,
>> then we change the kind of instrumentation and thus the input to the
>> optimizations behind the user’s back. I wouldn’t be surprised that using
>> profiles generated by FEPGO and IRPGO give you a final executable with
>> measurably different performance characteristics.
>>
>
> I think the point is that given the effort being put into IRPGO, the IRPGO
> version will always be a faster final executable.
>
>
> 2 things:
>  - Do you have strong evidence of this being true in all cases? I’m not
> looking forward to when I’ll transition Apple’s clang to IRPGO, because I
> know that we’ll have to spend time characterizing the performance of the
> compiler. It might be faster in all but a couple benchmarks, we still will
> need to investigate what happened there to see if we can fix it. I’m not
> saying it’s bad (I’m actually also looking forward to evaluate it, because
> it might give better results for us), but it’s definitely not a benign
> change that I would do without double checking the results. And as such, I
> also wouldn’t want the compiler to silently do this change behind my back.
>  - Even if it is 100% true, then users will see a degradation in
> performance if they add -fcoverage-mapping to their CFLAGS. I think this is
> just bad user experience. The user should be able to decide if he wants to
> be able to compromise some performance to be able to do PGO+coverage at the
> same time.
>
> Why provide a "worse" PGO option?
>
>
> Worse from your point of view. The resilience to compiler changes that
> FEPGO currently offers is a real feature even if it doesn’t have a place in
> Sony’s workflow.
>
>
>
>> If you’re tracking your performance, this can be really painful. Recently
>> we wasted days investigating performance regressions that were due to buggy
>> profiles. I strongly believe having an option seemingly unrelated to PGO
>> change this behavior is wrong and can cause actual pain for our end users.
>>
>
> After a deprecation period we can force `-fprofile-instr-generate` and
> `-fcoverage-mapping` to be mutually exclusive if necessary. Does this solve
> your problem?
>
>
> We don’t want a “deprecation period”. Unless IRPGO evolves in something
> that can do precise coverage and PGO in the same run, FEPGO can’t be
> deprecated (as in slated for removal).
>
> Actually, I think it makes a lot of sense in some respects for
> `-fprofile-instr-generate` and `-fprofile-instr-generate
> -fcoverage-mapping` to be IRPGO and FEPGO/coverage. The difference from a
> user's perspective is basically "is the instrumentation inserted by the
> compiler constrained to have source-level coverage, or does the compiler
> not have this restriction". Although as I've said, I'm not a fan of
> supporting FEPGO in the long-term due to maintenance issues.
>
> Also note that things like the context-sensitivity obtained through
> pre-inlining (see Rong's original RFC) is simply not obtainable within a
> source-level instrumentation paradigm (even if we did something like the
> counter fusion discussed in "[llvm-dev] RFC: Pass to prune redundant
> profiling instrumentation" to reduce the overhead to that of IRPGO with
> pre-inlining). Thus FEPGO a.k.a. "coverage-level PGO" would nonetheless be
> at an inherent disadvantage.
>
>
>> Also, David's point about redundant work on FEPGO is a good one. We don't
>> want to continue maintaining two different PGO’s.
>>
>>
>> Are you implying that LLVM should drop FEPGO? It’s a totally sensible
>> thing to do to use your tests as training data for your profile generation.
>> It’s also a very valid thing to do to use your tests to do coverage. Xcode
>> does both of these things. I would see it a a big regression to not support
>> doing both at the same time (this would mean doubling compile+testing time
>> for users of both).
>>
>
> As David pointed out, training runs for PGO and coverage have different
> goals. I'm very skeptical of any testing that tries to do both at the same
> time, but this will continue to work (albeit without benefitting from any
> of the effort being put into IRPGO).
>
>
> For a high-end game, sure. Small apps are a different beast and it’s a
> reasonable thing to use your set of tests as training data. Again, all the
> workflows are not equivalent.
>
> As the instrumentation needs to stay there for coverage anyway, I expect
>> FEPGO to stay there and maintained (we care a lot about coverage). I’m not
>> saying that all the work going into IRPGO should be duplicated in FEPGO,
>> but what’s there and working should keep working.
>>
>
> For my users the reduced overhead of IRPGO is an important feature, and
> making it the default is important for that reason. Since most of the
> effort going into PGO is focused on IRPGO, this will lead to users using
> FEPGO ending up as a second-tier PGO, which Vedant said he specifically
> wanted to avoid. The only option to avoid this is for users to not be using
> FEPGO.
>
>
> I’d love to have everybody be able to enjoy the lower overhead, but if the
> answer to this is that they need to compile+train twice to get coverage and
> PGO, then the trade-off is wrong. I suppose you can agree that every user
> doesn’t have the same requirements as game developers. At least users
> should have the choice.
>
> Also, FEPGO has a lot of nice characteristics like resilience to IRGEN
>> changes. If you have archived profiles, then when you switch compilers your
>> performance shouldn’t degrade with FEPGO (modulo optimization bugs), while
>> it’s much more likely to degrade with IRPGO.
>>
>
> Note that this use case continues to work. I.e. we continue to apply
> existing frontend profiles correctly. including frontend profiles generated
> with -fcoverage-mapping, so that collecting coverage and PGO at the same
> time (although not advisable) still works. The only use case that breaks is
> merging existing indexed profiles, which is why we are specifically waiting
> for an answer on whether this is a requirement for you guys at Apple, which
> will determine what kind of deprecation period etc. will be needed before
> we can default it.
>
>
>> It overall looks like a much better option for people who do not need the
>> lower instrumentation overhead.
>>
>
> This is not just about lower instrumentation overhead. Things like the
> recently added static VP node allocation (which will e.g. make indirect
> callsite promotion for LTO'd kernels work) are other things are being
> missed out on.
>
>
>
>> I would actually make the IRPGO mode completely incompatible with the
>>> -fcoverage-mapping flag.
>>>
>>
>> I'm not sure what you mean by this. Nobody is proposing anything that
>> would make -fcoverage-mapping do anything related to IRPGO.
>>
>>
>> What I mean is that -f<whatever enables IRPGO> should error out when
>> passed at the same time as -fcoverage-mapping.
>>
>
> I think you're coming into this with the mindset that FEPGO will still be
> a possibility (outside of a build that is used for coverage mapping). I'm
> not convinced that we actually need to continue exposing that except as a
> weird thing in conjunction with coverage (and possibly for a deprecation
> period if users want to merge indexed profiles).
>
>
> FEPGO needs to be a possibility. And the instrumentation code needs to be
> there anyway for the coverage. Here’s a proposal:
>  - Add -fpgo-instr=[llvm|clang] to choose which kind of PGO
> instrumentation you want
>  - make -fpgo-instr=llvm incompatible with with -fcoverage-mapping
>  - deprecate -fprofile-instr-generate, and remove it after a couple
> releases
>
> (I really don’t care about the names)
>
> I think it is a better way forward to have -fprofile-instr-generate keep
> it’s current meaning and to have downstream toolchain providers have an
> internal patch to make it an alias to the one they prefer. Just because
> changing the meaning of options is a bad thing.
>

Sure, I can deal with that. And actually this discussion has suggested some
good user-understandable names for the two different PGO's. I think that
"stable PGO" or "coverage-based PGO" or "source-level PGO" is a really good
name for FEPGO. "frontend" and "IR" aren't really meaningful to users.

Based on what I have seen, FEPGO has the following reasons to keep it
around:
- it currently can be used together with coverage
- it has certain compatibility guarantees about profiles across compiler
versions



> This also means that if the consensus is that -fprofile-instr-generate
> should really change its meaning to mean IRPGO, I’m open to having this
> internal patch on our side.
>

Yeah, it sounds like someone is going to have to keep a "private patch" to
change the default. At that point doing a switch on the triple in upstream
seems preferable though :/


So I propose the following (which is equivalent to what you proposed, but
starting to put specific option names):
1. Add -fprofile-instr-generate=stable and -fprofile-instr-generate=unstable
a) Indexed profiles for -fprofile-instr-generate=stable are guaranteed to
be compatible across compiler releases (which will include existing
releases). Additionally, -fprofile-instr-generate=stable can be used in
combination with -fcoverage-mapping.
b) Indexed profiles for -fprofile-instr-generate=unstable are not
guaranteed to be compatible across compiler releases. It is an error to use
this in conjunction with -fcoverage-mapping
c) -fprofile-instr-generate defaults to one of the two in a way that meets
vendor needs, via private patches or an upstream switch on the triple or
whatever. It can eventually be deprecated or whatever.

-fprofile-instr-generate=stable would basically correspond to FEPGO / FE
coverage instrumentation. -fprofile-instr-generate=unstable would basically
correspond to IRPGO.

What do you think?

-- Sean Silva


>
> I also think a new option is cleaner (again I don’t care about the name),
> but if people feel that -fprofile-instr-generate=[llvm|clang] is easier,
> then this works for me too.
>
> Fred
>
>
> -- Sean Silva
>
>
>>
>> Thanks!
>> Fred
>>
>> -- Sean Silva
>>
>>
>>
>>>
>>> Fred
>>>
>>>
>>>
>>>>
>>>> At its core I don't think -fprofile-instr-generate *implies* FE-based
>>>> instrumentation. So, I'd like to see the driver do this (on all platforms):
>>>>
>>>>   * -fprofile-instr-generate: IR instrumentation
>>>>   * -fprofile-instr-generate=IR: IR instrumentation
>>>>   * -fprofile-instr-generate=FE: FE instrumentation
>>>>   * -fprofile-instr-generate -fcoverage-mapping: FE + coverage
>>>> instrumentation
>>>>
>>>> It's a bit ugly because the meaning of -fprofile-instr-generate becomes
>>>> context-sensitive. But, (1) it doesn't break existing common workflows and
>>>> (2) it makes it easier to ship IRPGO. The big caveat here is that we'll
>>>> need to wait a bit and see if our internal users are OK with this.
>>>>
>>>
>>> Is there a reason to even have the possibility for FEPGO in the long
>>> run? From what I can tell, at most we would add a
>>> -fuse-the-old-pgo-because-i-want-to-merge-with-old-profiles option to
>>> hold people over until they can regenerate their profiles with the current
>>> compiler. We can add a flag to control what pre-instrumentation is done to
>>> retain the source-level robustness of FEPGO (e.g.
>>> -fpgo-no-simplify-before-instrumenting or something).
>>>
>>>
>>>> One alternative is to introduce a separate driver flag for IRPGO. This
>>>> might not work well for Sony's existing users. I'd be interested in any
>>>> feedback about this approach.
>>>>
>>>
>>> Personally, I would prefer to maintaining command line compatibility for
>>> PGO in Clang (i.e. users don't have to modify their build systems).
>>>
>>>
>>> -- Sean Silva
>>>
>>>
>>>>
>>>>
>>>> > I really don't like fragmenting things like this (e.g. if a
>>>> third-party tests "clang's" PGO they will get something different depending
>>>> on the platform), but I don't see another way given Apple's constraints.
>>>> >
>>>> > I'd like to see IRPGO to be the default as well, but the first thing
>>>> we need is a driver level option to make the switch (prof-gen) -- currently
>>>> we rely on -Xclang option to switch between two modes, which is less than
>>>> ideal.
>>>> >
>>>> > If the concern from Apple is that the old profile still need to work,
>>>> then this is problem already solved. The reason is that -fprofile-instr-use
>>>> can automatically detect the type of the profile and switch the mode.
>>>>
>>>> It's not just that. As Sean pointed out, we're concerned about old
>>>> profiles inter-operating poorly with new ones.
>>>>
>>>> thanks,
>>>> vedant
>>>>
>>>>
>>>> > - Pre-instrumentation passes
>>>> >
>>>> > Pre-instrumentation optimization has been critical for reducing the
>>>> overhead of PGO for the PS4 games we tested (as expected). However, in our
>>>> measurements (and we are glad to provide more info) the main benefit was
>>>> inlining (also as expected). A simple pass of inlining at threshold 100
>>>> appeared to give all the benefits. Even inlining at threshold 0 gave almost
>>>> all the benefits. For example, the passes initially proposed in
>>>> http://reviews.llvm.org/D15828did not improve over just inlining with
>>>> threshold 100.
>>>> >
>>>> > (due to PR27299 we also need to add simplifycfg after inlining to
>>>> clean up, but this doesn't affect the instrumentation overhead in our
>>>> measurements)
>>>> >
>>>> > Bottom line: for our use cases, inlining does all the work, but we're
>>>> not opposed to having more passes, which might be beneficial for non-game
>>>> workloads (which is most code).
>>>> >
>>>> >
>>>> >
>>>> > Yes, Rong is re-collecting performance data before submitting the
>>>> patch.
>>>> >
>>>> > - Warnings
>>>> >
>>>> > We identified 3 classes of issues which manifest as spammy warnings
>>>> when applying profile data with IRPGO (these affect FEPGO also I believe,
>>>> but we looked in depth at IRPGO):
>>>> >
>>>> > 1. The main concerning one is that getPGOFuncName mangles the
>>>> filename into the counter name. This causes us to get
>>>> instrprof_error::unknown_function when the pgo-use build is done in a
>>>> different build directory from the training build (which is a reasonable
>>>> thing to support). In this situation, PGO data is useless for all `static`
>>>> functions (and as a byproduct results in a huge volume of warnings).
>>>> >
>>>> > This can be enhanced with an user option to override the behavior.
>>>> Can you help filing a tracking bug?
>>>> >
>>>> >
>>>> > 2. In different TU's, pre-instr inlining might make different
>>>> inlining decisions (for example, different functions may be available for
>>>> inlining), causing hash mismatch errors (instrprof_error::hash_mismatch).
>>>> In building a large game, we only saw 8 instance of this, so it is not as
>>>> severe as 1, but would be good to fix.
>>>> >
>>>> >
>>>> > Rong has a patch addressing that -- will submit after cleanup pass
>>>> change is done.
>>>> >
>>>> >
>>>> > 3. A .cpp file may be compiled and put into an archive, but then not
>>>> selected by the linker and will therefore not result in a counter in the
>>>> profraw. When compiling this file with pgo-use,
>>>> instrprof_error::unknown_function will result and a warning will be emitted.
>>>> >
>>>> > yes -- this is a common problem to other compilers as well.
>>>> >
>>>> >
>>>> >
>>>> > Case 1 can be fixed using a function hash or other unique identifier
>>>> instead of a file path. David, in D20195 you mentioned that Rong was
>>>> working on a patch that would fix 2; we are looking forward to that.
>>>> >
>>>> >
>>>> > Right.
>>>> >
>>>> > For 3, I unfortunately do not know of any solution. I don't think
>>>> there is a way for us to make this warning reliable in the face of this
>>>> circumstance. So my conclusion is that instrprof_error::unknown_function at
>>>> least must be defaulted to off unfortunately.
>>>> >
>>>> > yes, this can be annoying. If the warnings can be buffered, then the
>>>> compiler can check if this is due to missing profile for the whole file and
>>>> can reduce the warnings into one single warning (source file has no profile
>>>> data).  Making it off by default sounds fine to me too if it is too noisy.
>>>> >
>>>> > thanks,
>>>> >
>>>> > David
>>>> >
>>>> >
>>>> > -- Sean Silva
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160602/08f3a156/attachment-0001.html>


More information about the llvm-dev mailing list