[PATCH] D120230: [SelectOpti][1/4] Setup new select-optimize pass

Mon Mar 14 22:36:47 PDT 2022

davidxl added a comment.

In D120230#3381563 <https://reviews.llvm.org/D120230#3381563>, @wenlei wrote:

> In D120230#3381552 <https://reviews.llvm.org/D120230#3381552>, @davidxl wrote:
>
>>> On that internal workload, we've got 6% less cmov with this pass turned on for IRPGO (it works, no correctness issue :-) ). But perf-wise it's neutral (we can measure 0.2% perf movement on that workload with high confidence).
>>
>> Does BOLT's cmov optimization improve performance for this workload?
>
> This is being worked on now and we don't have data yet. The numbers above didn't have BOLT interfering with cmov.
>
>>> Say we end up with cmov in one of the sample PGO iterations (either due to lack of profile, or profile indicating branch being unbiased), we would lose the control flow profile that is needed to tell how biased that original branch is, because we've turned that control flow into data flow. Unless we never use cmov for branches without profile info, we could keep generating cmov in future iterations even if branch becomes more biased later because we will never get control flow profile again.
>>>
>>> If we indeed never use cmov for branch without profile, that turn this problem into a typical sample PGO oscillations. That is not the case before this patch set, are we changing the behavior now? I'm also not sure if such oscillation is as easily mitigable as other oscillations like those from speculative ICP.
>>
>> Regarding BOLT's usage for this problem -- does it mean the profile data is not collected from production binary but collected using pre-BOLD binary in a training run?  If this is the setup, compiler can choose to minimze cmov generation for the sake of better profilling.
>>
>> David
>
> Right, making compiler conservative to preserve branch so we can have control flow profile for BOLT to make final decision is the experiment we're doing. Pseudo-probe for sample PGO can also be tuned a bit more intrusive to disallow cmov for better profile in that setup.

But in this case, the binary (from BOLT) used to generate profile for the compiler still have cmov, so some loss of profile data is unavoidable. The oscillating issue will mostly be gone though.

>>>> The default misprediction rate used by the compiler (currently 25%) is expected to be less than the threshold that motivates a conversion to a cmov based on mispredict data. So, for example if a branch mispredicts 50% of the time, we could convert that to a cmov. Then the cmov will get compared with a branch that mispredicts 25% of the time, making the branch perhaps more desirable than it would have been if we had mispredict data. It is not necessary that the rest of the heuristics will allow a conversion back to a branch, but the cmov decision will be for sure revertible.
>>>
>>> nit: saying misprediction rate here and in the RFC is a bit confusing because today we don't have that data in profile. that threshold is how biased a branch is, which is a proxy for branch miss. But branch predictor could still do well (low branch miss) for unbiased branches.
>>>
>>>> In terms of making this decision at the BOLT level, it might have more limited applicability compared to a LLVM IR pass since it is a bit harder to find which branches are eligible to be converted to cmovs and employing dataflow-based heuristics as the ones possible in LLVM IR seem quite tricky.
>>>
>>> Yes, that is a different challenge.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120230/new/

https://reviews.llvm.org/D120230