[all-commits] [llvm/llvm-project] 498471: [BOLT] Fix density for jump-through functions (#14...

Amir Ayupov via All-commits all-commits at lists.llvm.org
Wed Jun 25 22:20:59 PDT 2025


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 49847148d4b1e4139833766aedb843a40a147039
      https://github.com/llvm/llvm-project/commit/49847148d4b1e4139833766aedb843a40a147039
  Author: Amir Ayupov <aaupov at fb.com>
  Date:   2025-06-25 (Wed, 25 Jun 2025)

  Changed paths:
    M bolt/lib/Passes/BinaryPasses.cpp
    A bolt/test/X86/zero-density.s

  Log Message:
  -----------
  [BOLT] Fix density for jump-through functions (#145619)

Address the issue that stems from how the density is computed.

Binary *function* density is the ratio of its total dynamic number of
executed bytes over the static size in bytes. The meaning of it is the
amount of dynamic profile information relative to its static size.

Binary *profile* density is the minimum *function* density among *well-
-profiled* functions, taken as functions covering p99 samples, or, in
other words, excluding functions in the tail 1% of samples. p99 is an
arbitrary cutoff. The meaning of profile density is the *minimum amount
of profile information per function* to be able to optimize the program
well. The threshold for profile density is set empirically.

The dynamically executed bytes are taken directly from LBR fall-throughs
and for LBRs recorded in trampoline functions, such as
```
000000001a941ec0 <Sleef_expf8_u10>:
1a941ec0: jmpq *0x37b911fa(%rip) # <pnt_expf8_u10>
1a941ec6: nopw %cs:(%rax,%rax)
```
the fall-through has zero length:
```
# Branch   Target   NextBranch  Count
T 1b171cf6 1a941ec0 1a941ec0    568562
```

But it's not correct to say this function has zero executed bytes, just
the size of the next branch is not included in the fall-through.

If such functions have non-trivial sample count, they will fall in p99
samples, and cause the profile density to be zero.

To solve this, we can either:
1. Include fall-through end jump size into executed bytes:
   is logically sound but technically challenging: the size needs to
   come from disassembly (expensive), and the threshold need to be
   reevaluated with updated definition of binary function density.
2. Exclude pass-through functions from density computation:
   follows the intent of profile density which is to set the amount of 
   profile information needed to optimize the function well. Single
   instruction pass-through functions don't need samples many times 
   the size to be optimized well.

Go with option 2 as a reasonable compromise.

Test Plan: added bolt/test/X86/zero-density.s



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list