[all-commits] [llvm/llvm-project] e3cf80: BlockFrequencyInfoImpl: Avoid big numbers, increas...

Tue Oct 24 20:27:52 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: e3cf80c5c1fe55efd8216575ccadea0ab087e79c
      https://github.com/llvm/llvm-project/commit/e3cf80c5c1fe55efd8216575ccadea0ab087e79c
  Author: Matthias Braun <matze at braunis.de>
  Date:   2023-10-24 (Tue, 24 Oct 2023)

  Changed paths:
    M compiler-rt/test/profile/Inputs/instrprof-gcov-multiple-bbs-single-line.c.gcov
    M llvm/lib/Analysis/BlockFrequencyInfoImpl.cpp
    M llvm/test/Analysis/BlockFrequencyInfo/loops_with_profile_info.ll
    A llvm/test/Analysis/BlockFrequencyInfo/precision.ll
    M llvm/test/CodeGen/AArch64/arm64-spill-remarks-treshold-hotness.ll
    M llvm/test/CodeGen/AArch64/cfi-fixup.ll
    M llvm/test/CodeGen/AArch64/redundant-mov-from-zero-extend.ll
    M llvm/test/CodeGen/AArch64/win64-jumptable.ll
    M llvm/test/CodeGen/AArch64/wineh-bti.ll
    M llvm/test/CodeGen/AMDGPU/greedy-broken-ssa-verifier-error.mir
    M llvm/test/CodeGen/AMDGPU/machine-sink-temporal-divergence-swdev407790.ll
    M llvm/test/CodeGen/AMDGPU/optimize-negated-cond.ll
    M llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll
    M llvm/test/CodeGen/ARM/indirectbr.ll
    M llvm/test/CodeGen/ARM/v8m.base-jumptable_alignment.ll
    M llvm/test/CodeGen/Mips/indirect-jump-hazard/jumptables.ll
    M llvm/test/CodeGen/Mips/jump-table-mul.ll
    M llvm/test/CodeGen/Mips/nacl-align.ll
    M llvm/test/CodeGen/Mips/pseudo-jump-fill.ll
    M llvm/test/CodeGen/PowerPC/aix-lower-jump-table.ll
    M llvm/test/CodeGen/PowerPC/jump-tables-collapse-rotate.ll
    M llvm/test/CodeGen/PowerPC/p10-spill-crgt.ll
    M llvm/test/CodeGen/PowerPC/p10-spill-crlt.ll
    M llvm/test/CodeGen/PowerPC/pr45448.ll
    M llvm/test/CodeGen/PowerPC/reduce_cr.ll
    M llvm/test/CodeGen/PowerPC/tail-dup-layout.ll
    M llvm/test/CodeGen/RISCV/branch-relaxation.ll
    M llvm/test/CodeGen/RISCV/jumptable.ll
    M llvm/test/CodeGen/RISCV/shrinkwrap-jump-table.ll
    M llvm/test/CodeGen/Thumb2/bti-indirect-branches.ll
    M llvm/test/CodeGen/Thumb2/constant-hoisting.ll
    M llvm/test/CodeGen/Thumb2/mve-blockplacement.ll
    M llvm/test/CodeGen/Thumb2/mve-float16regloops.ll
    M llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
    M llvm/test/CodeGen/Thumb2/mve-pred-vselect.ll
    M llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
    M llvm/test/CodeGen/Thumb2/v8_IT_5.ll
    M llvm/test/CodeGen/VE/Scalar/br_jt.ll
    M llvm/test/CodeGen/VE/Scalar/brind.ll
    M llvm/test/CodeGen/X86/2008-04-17-CoalescerBug.ll
    M llvm/test/CodeGen/X86/2009-08-12-badswitch.ll
    M llvm/test/CodeGen/X86/bb_rotate.ll
    M llvm/test/CodeGen/X86/callbr-asm-outputs.ll
    M llvm/test/CodeGen/X86/code_placement_ext_tsp_large.ll
    M llvm/test/CodeGen/X86/conditional-tailcall.ll
    M llvm/test/CodeGen/X86/div-rem-pair-recomposition-signed.ll
    M llvm/test/CodeGen/X86/div-rem-pair-recomposition-unsigned.ll
    M llvm/test/CodeGen/X86/dup-cost.ll
    M llvm/test/CodeGen/X86/fsafdo_test3.ll
    M llvm/test/CodeGen/X86/mul-constant-result.ll
    M llvm/test/CodeGen/X86/pic.ll
    M llvm/test/CodeGen/X86/pr38795.ll
    M llvm/test/CodeGen/X86/speculative-load-hardening-indirect.ll
    M llvm/test/CodeGen/X86/statepoint-ra.ll
    M llvm/test/CodeGen/X86/switch-bt.ll
    M llvm/test/CodeGen/X86/switch.ll
    M llvm/test/CodeGen/X86/tail-dup-multiple-latch-loop.ll
    M llvm/test/CodeGen/X86/tail-dup-no-other-successor.ll
    M llvm/test/CodeGen/X86/tail-opts.ll
    M llvm/test/CodeGen/X86/tailcall-cgp-dup.ll
    M llvm/test/CodeGen/X86/win-catchpad.ll
    M llvm/test/CodeGen/X86/win64-jumptable.ll
    M llvm/test/Other/cfg-printer-branch-weights.ll
    M llvm/test/ThinLTO/X86/function_entry_count.ll
    M llvm/test/Transforms/CodeExtractor/MultipleExitBranchProb.ll
    M llvm/test/Transforms/ConstantHoisting/X86/pr52689-not-all-uses-rebased.ll
    M llvm/test/Transforms/JumpThreading/thread-prob-7.ll
    M llvm/test/Transforms/JumpThreading/update-edge-weight.ll
    M llvm/test/Transforms/LICM/loopsink.ll
    M llvm/test/Transforms/LoopDataPrefetch/AArch64/opt-remark-with-hotness.ll
    M llvm/test/Transforms/LoopDistribute/diagnostics-with-hotness.ll
    M llvm/test/Transforms/LoopRotate/update-branch-weights.ll
    M llvm/test/Transforms/LoopVectorize/X86/avx512.ll
    M llvm/test/Transforms/LoopVectorize/X86/no_fpmath_with_hotness.ll
    M llvm/test/Transforms/LoopVectorize/diag-with-hotness-info-2.ll
    M llvm/test/Transforms/LoopVectorize/diag-with-hotness-info.ll
    M llvm/test/Transforms/PGOProfile/Inputs/PR41279_2.proftext
    M llvm/test/Transforms/PGOProfile/Inputs/bfi_verification.proftext
    M llvm/test/Transforms/PGOProfile/Inputs/criticaledge.proftext
    M llvm/test/Transforms/PGOProfile/Inputs/criticaledge_entry.proftext
    M llvm/test/Transforms/PGOProfile/Inputs/indirectbr.proftext
    M llvm/test/Transforms/PGOProfile/Inputs/indirectbr_entry.proftext
    M llvm/test/Transforms/PGOProfile/PR41279_2.ll
    M llvm/test/Transforms/PGOProfile/bfi_verification.ll
    M llvm/test/Transforms/PGOProfile/criticaledge.ll
    M llvm/test/Transforms/PGOProfile/fix_bfi.ll
    M llvm/test/Transforms/PGOProfile/loop2.ll
    M llvm/test/Transforms/SampleProfile/profile-correlation-irreducible-loops.ll
    M llvm/test/Transforms/SampleProfile/profile-inference-rebalance.ll
    M llvm/test/Transforms/SampleProfile/pseudo-probe-update-2.ll

  Log Message:
  -----------
  BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads

BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:

* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.