[PATCH] D98213: [InlineCost] Enable the cost benefit analysis on FDO

Fri Jan 20 16:01:45 PST 2023

davidxl added a comment.

In D98213#4070466 <https://reviews.llvm.org/D98213#4070466>, @aemerson wrote:

>> Note the size here is not static code size (it excludes cold code). It is actually a proxy to model the runtime cost due to increased instruction footprint (icache pressure). The multiplier's role is to make the savings and 'size' cost comparable in terms of unit. The cycle name here is also counted at IR level, so not at low level.
>
> I understand that, but it still doesn't make the comparison of different units any better. Introducing a scaling factor is irrelevant, it's just an arbitrary scalar.

Consider the multiplier to be "The average cycle cost due to frontend stalls per byte instruction", then it will see its meaning and why it is proper here.

>> Note that FDO is intended to be improve performance and the compile time budget is larger. This all seems WAI to me. If size is important, use -Os or -Oz, or even better use MLGO.
>
> The cases we've seen involve orders of magnitude increase, e.g. 100x bigger size and compile time. Not working as intended.

What is section that is affected the most? .text or .text.hot in your case?  FDO may increase .text but .text.unlikely section will be optimized for size and it is not uncommon to see total reduced size with FDO (cost benefit analysis on).  Spending some time to reduce the test case will make the discussion more meaningful.

>   // We use 128-bit APInt here to avoid potential overflow.  This variable
>   // should stay well below 10^^24 (or 2^^80) in practice.  This "worst" case
>   // assumes that we can avoid or fold a billion instructions, each with a
>   // profile count of 10^^15 -- roughly the number of cycles for a 24-hour
>   // period on a 4GHz machine.
>
> If you potentially need a 128 bit integer to store your "cycle savings", you should not be using that value to compare against a size cost. A sufficiently good "saving" will absolutely override a huge size cost.

I guess in your case, there are functions with huge cold blocks that got inlined into many different callsites resulting in bloat.   You may to try partial inlining to see if it helps.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98213/new/

https://reviews.llvm.org/D98213