[all-commits] [llvm/llvm-project] 780f69: [Clang][CMake] Add CSSPGO support to LLVM_BUILD_IN...
Amir Ayupov via All-commits
all-commits at lists.llvm.org
Wed Oct 1 15:25:56 PDT 2025
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 780f69cd922d8925648e11e771e77f0b46190e5b
https://github.com/llvm/llvm-project/commit/780f69cd922d8925648e11e771e77f0b46190e5b
Author: Amir Ayupov <aaupov at fb.com>
Date: 2025-10-01 (Wed, 01 Oct 2025)
Changed paths:
M clang/CMakeLists.txt
A clang/cmake/caches/BOLT-CSSPGO.cmake
M clang/cmake/caches/BOLT-PGO.cmake
A clang/cmake/caches/CSSPGO.cmake
M clang/utils/perf-training/CMakeLists.txt
M clang/utils/perf-training/perf-helper.py
M llvm/CMakeLists.txt
M llvm/cmake/modules/HandleLLVMOptions.cmake
Log Message:
-----------
[Clang][CMake] Add CSSPGO support to LLVM_BUILD_INSTRUMENTED (#79942)
Build on Clang-BOLT infrastructure to collect sample profile for CSSPGO.
Add CSSPGO.cmake and BOLT-CSSPGO.cmake to automate CSSPGO/+BOLT
Clang builds.
Note that `CLANG_PGO_TRAINING_DATA_SOURCE_DIR` is required as built-in
training set is inadequate for collecting sampled profile.
Hardware compatibility: CSSPGO requires synchronized (0-skid) call
and branch stacks, which is only available with Intel PEBS (Sandy
Bridge+),
AMD Zen3 with BRS, Zen4 with LBRv2+LBR_PMC_FREEZE, and Zen5 with LBRv2.
This patch adds support for Intel `br_inst_retired.near_taken:uppp`
event.
Test Plan:
Added BOLT-CSSPGO.cmake with same use as BOLT-PGO.cmake,
e.g. for bootstrapped ThinLTO+CSSPGO+BOLT, with CSSPGO profile collected
from LLVM build, and BOLT profile collected from Hello World
(instrumentation):
```
cmake -B clang-csspgo-bolt -S /path/to/llvm-project/llvm \
-DLLVM_ENABLE_LLD=ON -DBOOTSTRAP_LLVM_ENABLE_LLD=ON \
-DBOOTSTRAP_BOOTSTRAP_LLVM_ENABLE_LLD=ON \
-DPGO_INSTRUMENT_LTO=Thin \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=/path/to/llvm-project/llvm \
-GNinja -C /path/to/llvm-project/clang/cmake/caches/BOLT-CSSPGO.cmake
ninja stage2-clang-bolt
...
warning: Sample PGO is estimated to optimize better with 19.5x more samples. Please consider increasing sampling rate or profiling for longer duration to get more samples.
...
[2800/2801] Optimizing Clang with BOLT
BOLT-INFO: 8189 out of 106942 functions in the binary (7.7%) have non-empty execution profile
13776393 : taken branches (-42.1%)
```
Performance testing with Clang:
- Setup: Clang-BOLT testing harness
https://github.com/aaupov/llvm-devmtg-2022/commit/9f2b46f67a1930a51c58a0e4894637a8c64c570e
- CSSPGO training: building LLVM,
- InstrPGO training: building Hello World,
- BOLT training: building Hello World, instrumentation,
- benchmark: building small LLVM tool (not),
- 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for
build,
- Results, wall time, lower is better
- Baseline (bootstrapped build): 10.36s,
- InstrPGO + ThinLTO: 9.34s,
- CSSPGO + ThinLTO: 8.85s.
- BOLT results, for reference:
- Baseline: 9.09s,
- InstrPGO + ThinLTO: 9.09s,
- CSSPGO + ThinLTO: 8.58s.
---------
Co-authored-by: Matthias Braun <matze at braunis.de>
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list