[clang] [clang][perf-training] Fix profiling with -DCLANG_BOLT=perf (PR #119117)

Mon Dec 9 13:49:37 PST 2024

aaupov wrote:

> However, I'm wondering if maybe we should disable llvm-suppot and only run hello-world with -DCLANG_BOLT=perf. 

Existing perf training is inadequate for collecting sampled profile – we simply don't get enough samples, and no-LBR mode further drops the performance. If we wanted to pursue perf sampling further, we'd need to extend perf training with either building LLVM subtargets or llvm-test-suite.

> The bolt optimizations with perf only give about a 3% performance increase (although maybe with hw counters this would be better) 

Yes, no-LBR mode has very limited benefit due to missing edge counts.

> and it takes a very long time to convert all the perf profiles to the fdata format.

As discussed on Discord, we may be able to reduce the time by dropping `profile-format=yaml` which is not required in this case. By the way, although I understand the overhead adds up, we shouldn't be converting many profiles: I guess just two - one for cxx/helloworld and another for llvm-support.

https://github.com/llvm/llvm-project/pull/119117