[PATCH] D98213: [InlineCost] Enable the cost benefit analysis on FDO

Fri Jan 20 15:19:04 PST 2023

davidxl added a comment.

In D98213#4070344 <https://reviews.llvm.org/D98213#4070344>, @aemerson wrote:

> In D98213#4070282 <https://reviews.llvm.org/D98213#4070282>, @davidxl wrote:
>
>> In D98213#4070227 <https://reviews.llvm.org/D98213#4070227>, @aemerson wrote:
>>
>>> I want to resurrect this change that enabled these cost-benefit heuristics originally developed in https://reviews.llvm.org/D92780
>>>
>>> There are a number of issues with the original change that I can see, and this has caused all manner of problems for us internally as we moved to the new pass manager (and therefore started using these heuristics).
>>>
>>> The main one being that the heuristics seem to completely override all sensible thresholds for code size. The feature caims to compare the performance gains by measuring the cycles saved from a PGO run, and comparing against some threshold. This doesn't make sense since the length of the runs for PGO are arbitrary, and you shouldn't be using them in absolute terms. The only way it makes sense is if you have a very specific workload and the PGO profile run is exactly the same as the real world runs. The original change gave absolutely no justification, formal or intuitive, for the use of the equation to determine profitability. Giving positive runtime results is no justification for completely overriding any reasonable thresholds and potentially impacting code size & compile time by several orders of magnitude as we've seen internally at Apple.
>>>
>>> Even worse: D92780 <https://reviews.llvm.org/D92780> was accepted into mainline LLVM with absolutely no tests, and then enabled for all users in this change. I didn't even see any discussion about tests, or explanation for why they were impractical. We've disabled these heuristics downstream but I absolutely believe that this must be reverted upstream too until it can be re-worked and re-reviewed in detail.
>>
>> FYI: the cycle savings are normalized by dividing the function entry count, so it is not used in absolute sense.
>
> Right, that's fair enough. See the following comment from the code:
>
>   //  CycleSavings      PSI->getOrCompHotCountThreshold()
>   // -------------- >= -----------------------------------
>   //       Size              InlineSavingsMultiplier
>
> Cycles are being compared against code size here. How is that metric supposed to be interpreted? What does (cycles/entry)/(bytes) even mean as a unit of measurement? The feature itself seems to be trying to optimize too fine details in machine cycles in a part of the compiler that shouldn't be dealing with such low level specifics.

Note the size here is not static code size (it excludes cold code). It is actually a proxy to model the runtime cost due to increased instruction footprint (icache pressure).   The multiplier's role is to make the savings and 'size' cost comparable in terms of unit.   The cycle name here is also counted at IR level, so not at low level.

Note that the multiplier should indeed be a target platform dependent -- and will be tuned in the future.

>> I think it is worth adding some tests.
>>
>> For the problems you have (I suppose it is performance related?) More details would be helpful (perhaps using a bug to document it).
>
> It's code size and compile time. These heuristics seem to be returning effectively an always-inline cost, which forces the inliner to inline huge functions. Clearly the heuristics should be much more conservative. Unfortunately I can't share the test case since it's internal code and massive.

Note that FDO is intended to be improve performance and the compile time budget is larger. This all seems WAI to me.  If size is important, use -Os or -Oz, or even better use MLGO.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98213/new/

https://reviews.llvm.org/D98213