[all-commits] [llvm/llvm-project] a41900: [CSSPGO][Preinliner] Use linear threshold to drive...

Sun May 8 22:23:47 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: a4190037fac06c2b0cc71b8bb90de9e6b570ebb5
      https://github.com/llvm/llvm-project/commit/a4190037fac06c2b0cc71b8bb90de9e6b570ebb5
  Author: Hongtao Yu <hoy at fb.com>
  Date:   2022-05-08 (Sun, 08 May 2022)

  Changed paths:
    M llvm/tools/llvm-profgen/CSPreInliner.cpp
    M llvm/tools/llvm-profgen/CSPreInliner.h
    M llvm/tools/llvm-profgen/ProfileGenerator.cpp
    M llvm/tools/llvm-profgen/ProfileGenerator.h

  Log Message:
  -----------
  [CSSPGO][Preinliner] Use linear threshold to drive inline decision.

The per-callsite size threshold used today to drive preinline decision is based on hotness/coldness cutoff. The default setup is for callsites with a sample count above the hotness cutoff (99%), a 1500 size threshold is used. Any callsite below 99.99% coldness cutoff uses a zero threshold. This has a couple issues:

1. While both cutoffs and size thoresholds are configurable, different applications may need different setups, making a universal setup impractical.

2. The callsites between hotness cutoff and coldness cutoff are not considered as inline candidates, which could be a missing opportunity.

3. Hot callsites always use the same threshold. In reality we may want a bigger threshold for hotter callsites.

In this change we are introducing a linear threshold regardless of hot/cold cutoffs. Given a sample space, a threshold is computed for a callsite based on the position of that callsite sample in the whole space. With that we no longer need to define what's hot or cold. Callsites with different hotness will get a different threshold. This should overcome the above three issues.

I have seen good results with a universal default setup for two of our internal services.

For one service, 0.2% to 0.5% perf improvement over a baseline with a previous default setup, on-par code size.
For the second service, 0.5% to 0.8% perf improvement over a baseline with a previous default setup, 0.2% code size increase; on-par performance and code size with a baseline that is with a carefully tuned cutoff to cover enough hot functions.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D125023