[all-commits] [llvm/llvm-project] 780f69: [Clang][CMake] Add CSSPGO support to LLVM_BUILD_IN...

Amir Ayupov via All-commits all-commits at lists.llvm.org
Wed Oct 1 15:25:56 PDT 2025


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 780f69cd922d8925648e11e771e77f0b46190e5b
      https://github.com/llvm/llvm-project/commit/780f69cd922d8925648e11e771e77f0b46190e5b
  Author: Amir Ayupov <aaupov at fb.com>
  Date:   2025-10-01 (Wed, 01 Oct 2025)

  Changed paths:
    M clang/CMakeLists.txt
    A clang/cmake/caches/BOLT-CSSPGO.cmake
    M clang/cmake/caches/BOLT-PGO.cmake
    A clang/cmake/caches/CSSPGO.cmake
    M clang/utils/perf-training/CMakeLists.txt
    M clang/utils/perf-training/perf-helper.py
    M llvm/CMakeLists.txt
    M llvm/cmake/modules/HandleLLVMOptions.cmake

  Log Message:
  -----------
  [Clang][CMake] Add CSSPGO support to LLVM_BUILD_INSTRUMENTED (#79942)

Build on Clang-BOLT infrastructure to collect sample profile for CSSPGO.
Add CSSPGO.cmake and BOLT-CSSPGO.cmake to automate CSSPGO/+BOLT 
Clang builds.

Note that `CLANG_PGO_TRAINING_DATA_SOURCE_DIR` is required as built-in 
training set is inadequate for collecting sampled profile.

Hardware compatibility: CSSPGO requires synchronized (0-skid) call
and branch stacks, which is only available with Intel PEBS (Sandy
Bridge+),
AMD Zen3 with BRS, Zen4 with LBRv2+LBR_PMC_FREEZE, and Zen5 with LBRv2.
This patch adds support for Intel `br_inst_retired.near_taken:uppp`
event.

Test Plan:
Added BOLT-CSSPGO.cmake with same use as BOLT-PGO.cmake, 
e.g. for bootstrapped ThinLTO+CSSPGO+BOLT, with CSSPGO profile collected
from LLVM build, and BOLT profile collected from Hello World
(instrumentation):
```
cmake -B clang-csspgo-bolt -S /path/to/llvm-project/llvm \
-DLLVM_ENABLE_LLD=ON -DBOOTSTRAP_LLVM_ENABLE_LLD=ON \
-DBOOTSTRAP_BOOTSTRAP_LLVM_ENABLE_LLD=ON \
-DPGO_INSTRUMENT_LTO=Thin \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=/path/to/llvm-project/llvm \
-GNinja  -C /path/to/llvm-project/clang/cmake/caches/BOLT-CSSPGO.cmake
ninja stage2-clang-bolt
...
warning: Sample PGO is estimated to optimize better with 19.5x more samples. Please consider increasing sampling rate or profiling for longer duration to get more samples.
...
[2800/2801] Optimizing Clang with BOLT
BOLT-INFO: 8189 out of 106942 functions in the binary (7.7%) have non-empty execution profile
            13776393 : taken branches (-42.1%)
```

Performance testing with Clang:
- Setup: Clang-BOLT testing harness
https://github.com/aaupov/llvm-devmtg-2022/commit/9f2b46f67a1930a51c58a0e4894637a8c64c570e
  - CSSPGO training: building LLVM, 
  - InstrPGO training: building Hello World,
  - BOLT training: building Hello World, instrumentation,
  - benchmark: building small LLVM tool (not),
- 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for
build,
- Results, wall time, lower is better
  - Baseline (bootstrapped build): 10.36s,
  - InstrPGO + ThinLTO: 9.34s,
  - CSSPGO + ThinLTO: 8.85s.
- BOLT results, for reference:
  - Baseline: 9.09s,
  - InstrPGO + ThinLTO: 9.09s,
  - CSSPGO + ThinLTO: 8.58s.

---------

Co-authored-by: Matthias Braun <matze at braunis.de>



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list