[PATCH] D120230: [SelectOpti][1/4] Setup new select-optimize pass

Mon Mar 14 21:13:26 PDT 2022

wenlei added a comment.

In D120230#3380072 <https://reviews.llvm.org/D120230#3380072>, @apostolakis wrote:

> In D120230#3378380 <https://reviews.llvm.org/D120230#3378380>, @wenlei wrote:
>
>> In D120230#3377783 <https://reviews.llvm.org/D120230#3377783>, @apostolakis wrote:
>>
>>> In D120230#3374674 <https://reviews.llvm.org/D120230#3374674>, @wenlei wrote:
>>>
>>>> Thanks for the patches. We've noticed similar problems as you described in RFC which often leads to way too aggressive cmov (in general and in comparison to gcc). Is the current version of this stack complete and functional? If so, we'd be happy to give it a try to see how it handles the suboptimal cases we spotted.
>>>
>>> This stack is complete and functional for x86 instr-PGO.
>>
>> Thanks. I'm trying this with IRPGO on a large internal workload now, should have results soon. Will also take a closer look at the patch set.
>
> Great! Let me know if you see any regressions or issues.

On that internal workload, we've got 6% less cmov with this pass turned on for IRPGO (it works, no correctness issue :-) ). But perf-wise it's neutral (we can measure 0.2% perf movement on that workload with high confidence).

>>> It is not yet tuned for Sample-PGO and there are some Sample-PGO-specific improvements that are yet to be made, most notably leveraging LBR data to capture misprediction rates and incorporating them in the heuristics (as discussed with @modimo in the RFC).
>>
>> While we can have branch miss reported by perf tools and feedback into compiler sample PGO, it would be challenging to accurately correlate the actual branch miss to the unoptimized IR at profile loading time (it is a general challenge to apply any low level PMU info for compiler PGO).
>
> As David said, FSAFDO might alleviate this problem.
>
>>> So, if you are interested in Sample-PGO it is better to wait for the next patch series that will tailor this pass for that. At that point, we can iterate and refine to make sure that the pass addresses these suboptimal cases and avoids regressions for your workloads.
>>
>> For sample PGO, we're going to experiment with deferring some compiler if-convert to PLO (BOLT in our case), where the correlation of branch miss won't be a problem, and also to avoid the problem of being stuck with cmov. cc @Amir @maksfb
>
> The lack of mispredict data for cmovs will be a problem but we will not be stuck with a cmov decision but rather we might observe some oscillations (which is still problematic but not atypical of SampleFDO settings and there are some known remedies).

Say we end up with cmov in one of the sample PGO iterations (either due to lack of profile, or profile indicating branch being unbiased), we would lose the control flow profile that is needed to tell how biased that original branch is, because we've turned that control flow into data flow. Unless we never use cmov for branches without profile info, we could keep generating cmov in future iterations even if branch becomes more biased later because we will never get control flow profile again.

If we indeed never use cmov for branch without profile, that turn this problem into a typical sample PGO oscillations. That is not the case before this patch set, are we changing the behavior now? I'm also not sure if such oscillation is as easily mitigable as other oscillations like those from speculative ICP.

> The default misprediction rate used by the compiler (currently 25%) is expected to be less than the threshold that motivates a conversion to a cmov based on mispredict data. So, for example if a branch mispredicts 50% of the time, we could convert that to a cmov. Then the cmov will get compared with a branch that mispredicts 25% of the time, making the branch perhaps more desirable than it would have been if we had mispredict data. It is not necessary that the rest of the heuristics will allow a conversion back to a branch, but the cmov decision will be for sure revertible.

nit: saying misprediction rate here and in the RFC is a bit confusing because today we don't have that data in profile. that threshold is how biased a branch is, which is a proxy for branch miss. But branch predictor could still do well (low branch miss) for unbiased branches.

> In terms of making this decision at the BOLT level, it might have more limited applicability compared to a LLVM IR pass since it is a bit harder to find which branches are eligible to be converted to cmovs and employing dataflow-based heuristics as the ones possible in LLVM IR seem quite tricky.

Yes, that is a different challenge.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120230/new/

https://reviews.llvm.org/D120230