[llvm-dev] The state of IRPGO (3 remaining work items)

Xinliang David Li via llvm-dev llvm-dev at lists.llvm.org
Mon Jun 13 15:07:37 PDT 2016


No problem. Inliner work is certainly higher priority. Rong can help with
the option related work.

thanks,
David

On Mon, Jun 13, 2016 at 3:05 PM, Sean Silva <chisophugis at gmail.com> wrote:

> Quick update. I've gotten derailed from posting a patch for this due to
> focusing on higher priority PGO inlining work. No ETA.
>
> -- Sean Silva
>
>
> On Fri, Jun 3, 2016 at 6:06 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Thu, Jun 2, 2016 at 6:41 PM, Xinliang David Li <davidxl at google.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, Jun 2, 2016 at 5:30 PM, Sean Silva <chisophugis at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Jun 2, 2016 at 2:51 PM, Frédéric Riss <friss at apple.com> wrote:
>>>>
>>>>>
>>>>> On Jun 2, 2016, at 12:10 AM, Sean Silva <chisophugis at gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 1, 2016 at 5:46 PM, Frédéric Riss <friss at apple.com> wrote:
>>>>>
>>>>>>
>>>>>> On Jun 1, 2016, at 1:46 PM, Sean Silva <chisophugis at gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, May 31, 2016 at 6:02 PM, Frédéric Riss <friss at apple.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On May 24, 2016, at 5:21 PM, Sean Silva <chisophugis at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, May 24, 2016 at 3:41 PM, Vedant Kumar <vsk at apple.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> > On May 23, 2016, at 8:56 PM, Xinliang David Li <
>>>>>>>> davidxl at google.com> wrote:
>>>>>>>> >
>>>>>>>> > On Mon, May 23, 2016 at 8:23 PM, Sean Silva <
>>>>>>>> chisophugis at gmail.com> wrote:
>>>>>>>> > Jake and I have been integrating IRPGO on PS4, and we've
>>>>>>>> identified 3 remaining work items.
>>>>>>>> >
>>>>>>>> > Sean, thanks for the write up. It matches very well with what we
>>>>>>>> think as well.
>>>>>>>>
>>>>>>>> + 1
>>>>>>>>
>>>>>>>>
>>>>>>>> > - Driver changes
>>>>>>>> >
>>>>>>>> > We'd like to make IRPGO the default on PS4. We also think that it
>>>>>>>> would be beneficial to make IRPGO the default PGO on all platforms
>>>>>>>> (coverage would continue to use FE instr as it does currently, of course).
>>>>>>>> In previous conversations (e.g. http://reviews.llvm.org/D15829) it
>>>>>>>> has come up that Apple have requirements that would prevent them from
>>>>>>>> moving to IRPGO as the default PGO, at least without a deprecation period
>>>>>>>> of one or two releases.
>>>>>>>>
>>>>>>>> Sean pointed out the problematic scenario in D15829 (in plan "C"):
>>>>>>>>
>>>>>>>> ```
>>>>>>>> All existing user workflows continue to work, except for workflows
>>>>>>>> that attempt to llvm-profdata merge some old frontend profile data (e.g.
>>>>>>>> they have checked-in to version control and represents some special
>>>>>>>> workload) with the profile data from new binaries.
>>>>>>>> ```
>>>>>>>>
>>>>>>>> We can address this issue by (1) making sure llvm-profdata emits a
>>>>>>>> helpful warning when merging an FE-based profile with an IR-based one, and
>>>>>>>> (2) keeping an option to use FE instrumentation for PGO. Having (2) helps
>>>>>>>> people who can't (or don't want) to switch to IR PGO.
>>>>>>>>
>>>>>>>>
>>>>>>>> > I'd like to get consensus on a path forward.
>>>>>>>> > As a point of discussion, how about we make IRPGO the default on
>>>>>>>> all platforms except Apple platforms.
>>>>>>>>
>>>>>>>> I'd really rather not introduce this inconsistency. I'm worried
>>>>>>>> that it might lead to Darwin becoming a second-tier platform for PGO.
>>>>>>>>
>>>>>>>> Fred (CC'd) is following up with some of our internal users to
>>>>>>>> check if we can change the default behavior of -fprofile-instr-generate. He
>>>>>>>> should be able to chime in on this soon.
>>>>>>>>
>>>>>>>
>>>>>>> Sorry it took me so long.
>>>>>>>
>>>>>>
>>>>>> Hi Fred,
>>>>>>
>>>>>> My understanding is that you were specifically investigating whether
>>>>>> Apple needed compatibility for merging indexed profiles. Is that
>>>>>> compatibility needed? The only compelling argument I have heard to continue
>>>>>> to expose FEPGO is that Apple may have a compatibility requirement for
>>>>>> merging indexed profiles from previous compiler versions.
>>>>>>
>>>>>>
>>>>>> Sorry no, my comment had nothing to do with merging profiles. I
>>>>>> understand that this will break, and it might very well be an issue for us,
>>>>>> but I think there is a more fundamental issue with the proposed plan. As
>>>>>> you bring it up though, this is a user visible breakage that shouldn’t be
>>>>>> disregarded completely.
>>>>>>
>>>>>
>>>>> Merging with existing indexed profiles is the only user-visible
>>>>> breakage AFAIK (this was discussed at length in
>>>>> http://reviews.llvm.org/D15829 and the corresponding email thread).
>>>>> Please provide concrete examples where things would break.
>>>>>
>>>>>
>>>>> I feel like we are talking past each other. It is a fact that “merging
>>>>> existing indexed profiles” will break. You consider it to be a reasonable
>>>>> breakage, I’m just saying that we shouldn't ignore this completely. It is a
>>>>> known breakage so it should be considered when making the decision about
>>>>> the new flags’ behavior.
>>>>> We advise users to commit their profiles to VCS, so this could
>>>>> actually be a real issue for them (Xcode won’t try to merge old/new
>>>>> profiles by itself though, but users could do this). For those kind of
>>>>> changes where the alternative is not a full drop-in replacement, I think
>>>>> deprecating the option and replacing it with a new one is a better
>>>>> methodology.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Even if this is a requirement, then I still intend to make IRPGO the
>>>>>> default and only PGO going forward, at least on PS4. I think that doing the
>>>>>> same for all platforms in the upstream compiler probably makes sense as
>>>>>> well, since an internal Apple vendor compatibility requirement should not
>>>>>> penalize all users of the open source project.
>>>>>>
>>>>>>
>>>>>> Again, I’m not expressing an Apple requirement, just trying to
>>>>>> discuss the specifics of the proposed implementation. My goal is not to
>>>>>> hinder anything, and I want our platforms to be able to use IRPGO reliably
>>>>>> if users see the need for it.
>>>>>>
>>>>>
>>>>> What I'm saying is that besides reduced training overhead (and the
>>>>> inability to merge with older indexed profiles, which AFAIK is the only
>>>>> actual potential requirement that would need a deprecation period for
>>>>> FEPGO), IRPGO is basically "just a better PGO", so adding a frontend one
>>>>> (except as something purely during a deprecation period) is
>>>>> pointless. "just a better PGO" is what IRPGO is for my users. I don't want
>>>>> to have to have them deal with (and I don't want to support) FEPGO.
>>>>>
>>>>>
>>>>> Well we want to support FEPGO because it fits Xcode's workflow better
>>>>> (mainly it allows to do PGO + coverage at the same time).
>>>>>
>>>>> Anything that will cause the existing flag to continue to produce
>>>>> FEPGO on PS4 is not something that I'm really okay with. The reduced
>>>>> overhead of IRPGO is really important on PS4 (i.e. the difference between
>>>>> the instrumented game being playable or not). I really don't want to have
>>>>> to test the triple to determine the meaning of `-fprofile-instr-generate`
>>>>> (without `-fcoverage-mapping`).
>>>>>
>>>>>
>>>>> And I agree that IRPGO is better for your users, but the best workflow
>>>>> for your users is not necessarily the best workflow for every clang user. I
>>>>> totally agree with you that testing the triple to decide the meaning of the
>>>>> option is wrong to do upstream. The best default is not a platform choice,
>>>>> it’s a workflow decision.
>>>>>
>>>>>
>>>>>> I’ve discussed the change in behavior quiet extensively, and I after
>>>>>>> having changed my mind a couple times, I would argue in favor of keeping
>>>>>>> the current behavior for the existing flags. I think adding a new switch
>>>>>>> for IRPGO is a better option. The argument that weighted most on my opinion
>>>>>>> is the proposed interaction with -fcoverage-mapping, and it is not at all
>>>>>>> platform specific. With the proposed new behavior, turning coverage on and
>>>>>>> off in your build system will generate a binary with different performance
>>>>>>> characteristics and this feels really wrong.
>>>>>>>
>>>>>>
>>>>>> Bob already mentioned in the other thread that
>>>>>> `-fprofile-instr-generate -fcoverage-mapping` was sufficiently different
>>>>>> from `-fprofile-instr-generate` that `-fprofile-instr-generate
>>>>>> -fcoverage-mapping` was not an acceptable workaround that could be used for
>>>>>> enabling FEPGO during a transitionary period, so I'm not convinced that
>>>>>> your argument here makes sense.
>>>>>>
>>>>>>
>>>>>> I’m not sure what you’re referring to here, and I have a hard time
>>>>>> parsing the sentence. I suppose “was not an acceptable” should read “was an
>>>>>> acceptable”? I would be surprised that Bob ever agreed to completely
>>>>>> transition away from FEPGO. I didn’t even understand that getting rid of
>>>>>> FEPGO was on table as you seem to imply bellow.
>>>>>>
>>>>>
>>>>> No, it is written as intended. The backstory is in
>>>>> http://reviews.llvm.org/D15829 (and the corresponding email thread).
>>>>> The paragraph starting with "The coverage mapping adds considerable cost.”.
>>>>>
>>>>>
>>>>> Thanks for the pointer, it turns out this part of the thread never
>>>>> made it to my mailbox. What Bob is saying is that we need an option to turn
>>>>> on FEPGO without coverage, and this is basically exactly what I think we
>>>>> should do: add an explicit option to choose the instrumentation type, so
>>>>> that we can use FEPGO even without coverage in the new IRPGO world.
>>>>>
>>>>>
>>>>>> I also share David's opinion that this is not going to be an issue in
>>>>>> practice. I think it makes sense for PGO and coverage to have different
>>>>>> overheads. Coverage inherently has to trace all locations at source level,
>>>>>> while PGO has more freedom.
>>>>>>
>>>>>>
>>>>>> I’m sorry if I wasn’t clear, but I’m not talking about
>>>>>> instrumentation overhead, I’m talking about the performance of the binary
>>>>>> generated using the profiles. If we go the route of making the meaning of
>>>>>> -fprofile-instr-generate depend on whether -fcoverage-mapping gets passed,
>>>>>> then we change the kind of instrumentation and thus the input to the
>>>>>> optimizations behind the user’s back. I wouldn’t be surprised that using
>>>>>> profiles generated by FEPGO and IRPGO give you a final executable with
>>>>>> measurably different performance characteristics.
>>>>>>
>>>>>
>>>>> I think the point is that given the effort being put into IRPGO, the
>>>>> IRPGO version will always be a faster final executable.
>>>>>
>>>>>
>>>>> 2 things:
>>>>>  - Do you have strong evidence of this being true in all cases? I’m
>>>>> not looking forward to when I’ll transition Apple’s clang to IRPGO, because
>>>>> I know that we’ll have to spend time characterizing the performance of the
>>>>> compiler. It might be faster in all but a couple benchmarks, we still will
>>>>> need to investigate what happened there to see if we can fix it. I’m not
>>>>> saying it’s bad (I’m actually also looking forward to evaluate it, because
>>>>> it might give better results for us), but it’s definitely not a benign
>>>>> change that I would do without double checking the results. And as such, I
>>>>> also wouldn’t want the compiler to silently do this change behind my back.
>>>>>  - Even if it is 100% true, then users will see a degradation in
>>>>> performance if they add -fcoverage-mapping to their CFLAGS. I think this is
>>>>> just bad user experience. The user should be able to decide if he wants to
>>>>> be able to compromise some performance to be able to do PGO+coverage at the
>>>>> same time.
>>>>>
>>>>> Why provide a "worse" PGO option?
>>>>>
>>>>>
>>>>> Worse from your point of view. The resilience to compiler changes that
>>>>> FEPGO currently offers is a real feature even if it doesn’t have a place in
>>>>> Sony’s workflow.
>>>>>
>>>>>
>>>>>
>>>>>> If you’re tracking your performance, this can be really painful.
>>>>>> Recently we wasted days investigating performance regressions that were due
>>>>>> to buggy profiles. I strongly believe having an option seemingly unrelated
>>>>>> to PGO change this behavior is wrong and can cause actual pain for our end
>>>>>> users.
>>>>>>
>>>>>
>>>>> After a deprecation period we can force `-fprofile-instr-generate` and
>>>>> `-fcoverage-mapping` to be mutually exclusive if necessary. Does this solve
>>>>> your problem?
>>>>>
>>>>>
>>>>> We don’t want a “deprecation period”. Unless IRPGO evolves in
>>>>> something that can do precise coverage and PGO in the same run, FEPGO can’t
>>>>> be deprecated (as in slated for removal).
>>>>>
>>>>> Actually, I think it makes a lot of sense in some respects for
>>>>> `-fprofile-instr-generate` and `-fprofile-instr-generate
>>>>> -fcoverage-mapping` to be IRPGO and FEPGO/coverage. The difference from a
>>>>> user's perspective is basically "is the instrumentation inserted by the
>>>>> compiler constrained to have source-level coverage, or does the compiler
>>>>> not have this restriction". Although as I've said, I'm not a fan of
>>>>> supporting FEPGO in the long-term due to maintenance issues.
>>>>>
>>>>> Also note that things like the context-sensitivity obtained through
>>>>> pre-inlining (see Rong's original RFC) is simply not obtainable within a
>>>>> source-level instrumentation paradigm (even if we did something like the
>>>>> counter fusion discussed in "[llvm-dev] RFC: Pass to prune redundant
>>>>> profiling instrumentation" to reduce the overhead to that of IRPGO with
>>>>> pre-inlining). Thus FEPGO a.k.a. "coverage-level PGO" would nonetheless be
>>>>> at an inherent disadvantage.
>>>>>
>>>>>
>>>>>> Also, David's point about redundant work on FEPGO is a good one. We
>>>>>> don't want to continue maintaining two different PGO’s.
>>>>>>
>>>>>>
>>>>>> Are you implying that LLVM should drop FEPGO? It’s a totally sensible
>>>>>> thing to do to use your tests as training data for your profile generation.
>>>>>> It’s also a very valid thing to do to use your tests to do coverage. Xcode
>>>>>> does both of these things. I would see it a a big regression to not support
>>>>>> doing both at the same time (this would mean doubling compile+testing time
>>>>>> for users of both).
>>>>>>
>>>>>
>>>>> As David pointed out, training runs for PGO and coverage have
>>>>> different goals. I'm very skeptical of any testing that tries to do both at
>>>>> the same time, but this will continue to work (albeit without benefitting
>>>>> from any of the effort being put into IRPGO).
>>>>>
>>>>>
>>>>> For a high-end game, sure. Small apps are a different beast and it’s a
>>>>> reasonable thing to use your set of tests as training data. Again, all the
>>>>> workflows are not equivalent.
>>>>>
>>>>> As the instrumentation needs to stay there for coverage anyway, I
>>>>>> expect FEPGO to stay there and maintained (we care a lot about coverage).
>>>>>> I’m not saying that all the work going into IRPGO should be duplicated in
>>>>>> FEPGO, but what’s there and working should keep working.
>>>>>>
>>>>>
>>>>> For my users the reduced overhead of IRPGO is an important feature,
>>>>> and making it the default is important for that reason. Since most of the
>>>>> effort going into PGO is focused on IRPGO, this will lead to users using
>>>>> FEPGO ending up as a second-tier PGO, which Vedant said he specifically
>>>>> wanted to avoid. The only option to avoid this is for users to not be using
>>>>> FEPGO.
>>>>>
>>>>>
>>>>> I’d love to have everybody be able to enjoy the lower overhead, but if
>>>>> the answer to this is that they need to compile+train twice to get coverage
>>>>> and PGO, then the trade-off is wrong. I suppose you can agree that every
>>>>> user doesn’t have the same requirements as game developers. At least users
>>>>> should have the choice.
>>>>>
>>>>> Also, FEPGO has a lot of nice characteristics like resilience to IRGEN
>>>>>> changes. If you have archived profiles, then when you switch compilers your
>>>>>> performance shouldn’t degrade with FEPGO (modulo optimization bugs), while
>>>>>> it’s much more likely to degrade with IRPGO.
>>>>>>
>>>>>
>>>>> Note that this use case continues to work. I.e. we continue to apply
>>>>> existing frontend profiles correctly. including frontend profiles generated
>>>>> with -fcoverage-mapping, so that collecting coverage and PGO at the same
>>>>> time (although not advisable) still works. The only use case that breaks is
>>>>> merging existing indexed profiles, which is why we are specifically waiting
>>>>> for an answer on whether this is a requirement for you guys at Apple, which
>>>>> will determine what kind of deprecation period etc. will be needed before
>>>>> we can default it.
>>>>>
>>>>>
>>>>>> It overall looks like a much better option for people who do not need
>>>>>> the lower instrumentation overhead.
>>>>>>
>>>>>
>>>>> This is not just about lower instrumentation overhead. Things like the
>>>>> recently added static VP node allocation (which will e.g. make indirect
>>>>> callsite promotion for LTO'd kernels work) are other things are being
>>>>> missed out on.
>>>>>
>>>>>
>>>>>
>>>>>> I would actually make the IRPGO mode completely incompatible with the
>>>>>>> -fcoverage-mapping flag.
>>>>>>>
>>>>>>
>>>>>> I'm not sure what you mean by this. Nobody is proposing anything that
>>>>>> would make -fcoverage-mapping do anything related to IRPGO.
>>>>>>
>>>>>>
>>>>>> What I mean is that -f<whatever enables IRPGO> should error out when
>>>>>> passed at the same time as -fcoverage-mapping.
>>>>>>
>>>>>
>>>>> I think you're coming into this with the mindset that FEPGO will still
>>>>> be a possibility (outside of a build that is used for coverage mapping).
>>>>> I'm not convinced that we actually need to continue exposing that except as
>>>>> a weird thing in conjunction with coverage (and possibly for a deprecation
>>>>> period if users want to merge indexed profiles).
>>>>>
>>>>>
>>>>> FEPGO needs to be a possibility. And the instrumentation code needs to
>>>>> be there anyway for the coverage. Here’s a proposal:
>>>>>  - Add -fpgo-instr=[llvm|clang] to choose which kind of PGO
>>>>> instrumentation you want
>>>>>  - make -fpgo-instr=llvm incompatible with with -fcoverage-mapping
>>>>>  - deprecate -fprofile-instr-generate, and remove it after a couple
>>>>> releases
>>>>>
>>>>> (I really don’t care about the names)
>>>>>
>>>>> I think it is a better way forward to have -fprofile-instr-generate
>>>>> keep it’s current meaning and to have downstream toolchain providers have
>>>>> an internal patch to make it an alias to the one they prefer. Just because
>>>>> changing the meaning of options is a bad thing.
>>>>>
>>>>
>>>> Sure, I can deal with that. And actually this discussion has suggested
>>>> some good user-understandable names for the two different PGO's. I think
>>>> that "stable PGO" or "coverage-based PGO" or "source-level PGO" is a really
>>>> good name for FEPGO. "frontend" and "IR" aren't really meaningful to users.
>>>>
>>>> Based on what I have seen, FEPGO has the following reasons to keep it
>>>> around:
>>>> - it currently can be used together with coverage
>>>> - it has certain compatibility guarantees about profiles across
>>>> compiler versions
>>>>
>>>>
>>>>
>>>>> This also means that if the consensus is that -fprofile-instr-generate
>>>>> should really change its meaning to mean IRPGO, I’m open to having this
>>>>> internal patch on our side.
>>>>>
>>>>
>>>> Yeah, it sounds like someone is going to have to keep a "private patch"
>>>> to change the default. At that point doing a switch on the triple in
>>>> upstream seems preferable though :/
>>>>
>>>>
>>>> So I propose the following (which is equivalent to what you proposed,
>>>> but starting to put specific option names):
>>>> 1. Add -fprofile-instr-generate=stable and
>>>> -fprofile-instr-generate=unstable
>>>> a) Indexed profiles for -fprofile-instr-generate=stable are guaranteed
>>>> to be compatible across compiler releases (which will include existing
>>>> releases). Additionally, -fprofile-instr-generate=stable can be used in
>>>> combination with -fcoverage-mapping.
>>>> b) Indexed profiles for -fprofile-instr-generate=unstable are not
>>>> guaranteed to be compatible across compiler releases. It is an error to use
>>>> this in conjunction with -fcoverage-mapping
>>>> c) -fprofile-instr-generate defaults to one of the two in a way that
>>>> meets vendor needs, via private patches or an upstream switch on the triple
>>>> or whatever. It can eventually be deprecated or whatever.
>>>>
>>>> -fprofile-instr-generate=stable would basically correspond to FEPGO /
>>>> FE coverage instrumentation. -fprofile-instr-generate=unstable would
>>>> basically correspond to IRPGO.
>>>>
>>>> What do you think?
>>>>
>>>
>>>
>>> Sounds fine to me, though I am not a fan of using unstable in the
>>> option. I think a more meaningful way (that capture the essence of the
>>> difference) is the following naming:
>>> 1) FEPGO:  -fprofile-instr-generate=source or
>>> -fprofile-instr-generate=region
>>> 2) IR: -fprofile-instr-generate=cfg or -fprofile-instr-generate=graph
>>>
>>> Also since -fprofile-instr-generate= form is already used to specify raw
>>> profile path, we may need a different driver option.
>>>
>>
>> Oops!
>>
>>
>>> Alternatives include
>>> 1) -fprofile-instrument=<...> -- this maps directly to the cc1 option we
>>> have today
>>> or
>>> 2) -fpgo-instr=<> -- suggested by Fred or
>>> 3) -fpgo-instr-method=<...>
>>>
>>
>> I will probably use one of these in the patch.
>>
>> -- Sean Silva
>>
>>
>>>
>>>
>>> Having the driver level option is the first good step forward.  In the
>>> near future, when performance of IRPGO is further tuned (e.g, better
>>> integration with inliner, more runtime perf win with -flto=thin, -flto
>>> etc), and the interests of the IRPGO is better aligned, we can revisit the
>>> default.
>>>
>>> thanks,
>>>
>>> David
>>>
>>>
>>>
>>>
>>>>
>>>> -- Sean Silva
>>>>
>>>>
>>>>>
>>>>> I also think a new option is cleaner (again I don’t care about the
>>>>> name), but if people feel that -fprofile-instr-generate=[llvm|clang] is
>>>>> easier, then this works for me too.
>>>>>
>>>>> Fred
>>>>>
>>>>>
>>>>> -- Sean Silva
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>> Fred
>>>>>>
>>>>>> -- Sean Silva
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Fred
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> At its core I don't think -fprofile-instr-generate *implies*
>>>>>>>> FE-based instrumentation. So, I'd like to see the driver do this (on all
>>>>>>>> platforms):
>>>>>>>>
>>>>>>>>   * -fprofile-instr-generate: IR instrumentation
>>>>>>>>   * -fprofile-instr-generate=IR: IR instrumentation
>>>>>>>>   * -fprofile-instr-generate=FE: FE instrumentation
>>>>>>>>   * -fprofile-instr-generate -fcoverage-mapping: FE + coverage
>>>>>>>> instrumentation
>>>>>>>>
>>>>>>>> It's a bit ugly because the meaning of -fprofile-instr-generate
>>>>>>>> becomes context-sensitive. But, (1) it doesn't break existing common
>>>>>>>> workflows and (2) it makes it easier to ship IRPGO. The big caveat here is
>>>>>>>> that we'll need to wait a bit and see if our internal users are OK with
>>>>>>>> this.
>>>>>>>>
>>>>>>>
>>>>>>> Is there a reason to even have the possibility for FEPGO in the long
>>>>>>> run? From what I can tell, at most we would add a
>>>>>>> -fuse-the-old-pgo-because-i-want-to-merge-with-old-profiles option
>>>>>>> to hold people over until they can regenerate their profiles with the
>>>>>>> current compiler. We can add a flag to control what pre-instrumentation is
>>>>>>> done to retain the source-level robustness of FEPGO (e.g.
>>>>>>> -fpgo-no-simplify-before-instrumenting or something).
>>>>>>>
>>>>>>>
>>>>>>>> One alternative is to introduce a separate driver flag for IRPGO.
>>>>>>>> This might not work well for Sony's existing users. I'd be interested in
>>>>>>>> any feedback about this approach.
>>>>>>>>
>>>>>>>
>>>>>>> Personally, I would prefer to maintaining command line compatibility
>>>>>>> for PGO in Clang (i.e. users don't have to modify their build systems).
>>>>>>>
>>>>>>>
>>>>>>> -- Sean Silva
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> > I really don't like fragmenting things like this (e.g. if a
>>>>>>>> third-party tests "clang's" PGO they will get something different depending
>>>>>>>> on the platform), but I don't see another way given Apple's constraints.
>>>>>>>> >
>>>>>>>> > I'd like to see IRPGO to be the default as well, but the first
>>>>>>>> thing we need is a driver level option to make the switch (prof-gen) --
>>>>>>>> currently we rely on -Xclang option to switch between two modes, which is
>>>>>>>> less than ideal.
>>>>>>>> >
>>>>>>>> > If the concern from Apple is that the old profile still need to
>>>>>>>> work, then this is problem already solved. The reason is that
>>>>>>>> -fprofile-instr-use can automatically detect the type of the profile and
>>>>>>>> switch the mode.
>>>>>>>>
>>>>>>>> It's not just that. As Sean pointed out, we're concerned about old
>>>>>>>> profiles inter-operating poorly with new ones.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> vedant
>>>>>>>>
>>>>>>>>
>>>>>>>> > - Pre-instrumentation passes
>>>>>>>> >
>>>>>>>> > Pre-instrumentation optimization has been critical for reducing
>>>>>>>> the overhead of PGO for the PS4 games we tested (as expected). However, in
>>>>>>>> our measurements (and we are glad to provide more info) the main benefit
>>>>>>>> was inlining (also as expected). A simple pass of inlining at threshold 100
>>>>>>>> appeared to give all the benefits. Even inlining at threshold 0 gave almost
>>>>>>>> all the benefits. For example, the passes initially proposed in
>>>>>>>> http://reviews.llvm.org/D15828did not improve over just inlining
>>>>>>>> with threshold 100.
>>>>>>>> >
>>>>>>>> > (due to PR27299 we also need to add simplifycfg after inlining to
>>>>>>>> clean up, but this doesn't affect the instrumentation overhead in our
>>>>>>>> measurements)
>>>>>>>> >
>>>>>>>> > Bottom line: for our use cases, inlining does all the work, but
>>>>>>>> we're not opposed to having more passes, which might be beneficial for
>>>>>>>> non-game workloads (which is most code).
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Yes, Rong is re-collecting performance data before submitting the
>>>>>>>> patch.
>>>>>>>> >
>>>>>>>> > - Warnings
>>>>>>>> >
>>>>>>>> > We identified 3 classes of issues which manifest as spammy
>>>>>>>> warnings when applying profile data with IRPGO (these affect FEPGO also I
>>>>>>>> believe, but we looked in depth at IRPGO):
>>>>>>>> >
>>>>>>>> > 1. The main concerning one is that getPGOFuncName mangles the
>>>>>>>> filename into the counter name. This causes us to get
>>>>>>>> instrprof_error::unknown_function when the pgo-use build is done in a
>>>>>>>> different build directory from the training build (which is a reasonable
>>>>>>>> thing to support). In this situation, PGO data is useless for all `static`
>>>>>>>> functions (and as a byproduct results in a huge volume of warnings).
>>>>>>>> >
>>>>>>>> > This can be enhanced with an user option to override the
>>>>>>>> behavior. Can you help filing a tracking bug?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2. In different TU's, pre-instr inlining might make different
>>>>>>>> inlining decisions (for example, different functions may be available for
>>>>>>>> inlining), causing hash mismatch errors (instrprof_error::hash_mismatch).
>>>>>>>> In building a large game, we only saw 8 instance of this, so it is not as
>>>>>>>> severe as 1, but would be good to fix.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Rong has a patch addressing that -- will submit after cleanup
>>>>>>>> pass change is done.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 3. A .cpp file may be compiled and put into an archive, but then
>>>>>>>> not selected by the linker and will therefore not result in a counter in
>>>>>>>> the profraw. When compiling this file with pgo-use,
>>>>>>>> instrprof_error::unknown_function will result and a warning will be emitted.
>>>>>>>> >
>>>>>>>> > yes -- this is a common problem to other compilers as well.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Case 1 can be fixed using a function hash or other unique
>>>>>>>> identifier instead of a file path. David, in D20195 you mentioned that Rong
>>>>>>>> was working on a patch that would fix 2; we are looking forward to that.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Right.
>>>>>>>> >
>>>>>>>> > For 3, I unfortunately do not know of any solution. I don't think
>>>>>>>> there is a way for us to make this warning reliable in the face of this
>>>>>>>> circumstance. So my conclusion is that instrprof_error::unknown_function at
>>>>>>>> least must be defaulted to off unfortunately.
>>>>>>>> >
>>>>>>>> > yes, this can be annoying. If the warnings can be buffered, then
>>>>>>>> the compiler can check if this is due to missing profile for the whole file
>>>>>>>> and can reduce the warnings into one single warning (source file has no
>>>>>>>> profile data).  Making it off by default sounds fine to me too if it is too
>>>>>>>> noisy.
>>>>>>>> >
>>>>>>>> > thanks,
>>>>>>>> >
>>>>>>>> > David
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > -- Sean Silva
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160613/51b1d673/attachment.html>


More information about the llvm-dev mailing list