[PATCH] D120230: [SelectOpti][1/4] Setup new select-optimize pass

Mon Mar 14 10:52:52 PDT 2022

apostolakis added a comment.

In D120230#3378380 <https://reviews.llvm.org/D120230#3378380>, @wenlei wrote:

> In D120230#3377783 <https://reviews.llvm.org/D120230#3377783>, @apostolakis wrote:
>
>> In D120230#3374674 <https://reviews.llvm.org/D120230#3374674>, @wenlei wrote:
>>
>>> Thanks for the patches. We've noticed similar problems as you described in RFC which often leads to way too aggressive cmov (in general and in comparison to gcc). Is the current version of this stack complete and functional? If so, we'd be happy to give it a try to see how it handles the suboptimal cases we spotted.
>>
>> This stack is complete and functional for x86 instr-PGO.
>
> Thanks. I'm trying this with IRPGO on a large internal workload now, should have results soon. Will also take a closer look at the patch set.

Great! Let me know if you see any regressions or issues.

>> It is not yet tuned for Sample-PGO and there are some Sample-PGO-specific improvements that are yet to be made, most notably leveraging LBR data to capture misprediction rates and incorporating them in the heuristics (as discussed with @modimo in the RFC).
>
> While we can have branch miss reported by perf tools and feedback into compiler sample PGO, it would be challenging to accurately correlate the actual branch miss to the unoptimized IR at profile loading time (it is a general challenge to apply any low level PMU info for compiler PGO).

As David said, FSAFDO might alleviate this problem.

>> So, if you are interested in Sample-PGO it is better to wait for the next patch series that will tailor this pass for that. At that point, we can iterate and refine to make sure that the pass addresses these suboptimal cases and avoids regressions for your workloads.
>
> For sample PGO, we're going to experiment with deferring some compiler if-convert to PLO (BOLT in our case), where the correlation of branch miss won't be a problem, and also to avoid the problem of being stuck with cmov. cc @Amir @maksfb

The lack of mispredict data for cmovs will be a problem but we will not be stuck with a cmov decision but rather we might observe some oscillations (which is still problematic but not atypical of SampleFDO settings and there are some known remedies). The default misprediction rate used by the compiler (currently 25%) is expected to be less than the threshold that motivates a conversion to a cmov based on mispredict data. So, for example if a branch mispredicts 50% of the time, we could convert that to a cmov. Then the cmov will get compared with a branch that mispredicts 25% of the time, making the branch perhaps more desirable than it would have been if we had mispredict data. It is not necessary that the rest of the heuristics will allow a conversion back to a branch, but the cmov decision will be for sure revertible.

In terms of making this decision at the BOLT level, it might have more limited applicability compared to a LLVM IR pass since it is a bit harder to find which branches are eligible to be converted to cmovs and employing dataflow-based heuristics as the ones possible in LLVM IR seem quite tricky.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120230/new/

https://reviews.llvm.org/D120230