[all-commits] [llvm/llvm-project] cc2fbc: [CodeLayout] Faster basic block reordering, ext-ts...
spupyrev via All-commits
all-commits at lists.llvm.org
Wed Oct 25 07:52:39 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: cc2fbc648d7babbfa612f4f5eda3160212ef6ca7
https://github.com/llvm/llvm-project/commit/cc2fbc648d7babbfa612f4f5eda3160212ef6ca7
Author: spupyrev <spupyrev at users.noreply.github.com>
Date: 2023-10-25 (Wed, 25 Oct 2023)
Changed paths:
M llvm/lib/Transforms/Utils/CodeLayout.cpp
M llvm/test/CodeGen/X86/code_placement_ext_tsp.ll
M llvm/test/CodeGen/X86/code_placement_ext_tsp_large.ll
Log Message:
-----------
[CodeLayout] Faster basic block reordering, ext-tsp (#68617)
Aggressive inlining might produce huge functions with >10K of basic
blocks. Since BFI treats _all_ blocks and jumps as "hot" having
non-negative (but perhaps small) weight, the current implementation can
be slow, taking minutes to produce an layout. This change introduces a
few modifications that significantly (up to 50x on some instances)
speeds up the computation. Some notable changes:
- reduced the maximum chain size to 512 (from the prior 4096);
- introduced MaxMergeDensityRatio param to avoid merging chains with
very different densities;
- dropped a couple of params that seem unnecessary.
Looking at some "offline" metrics (e.g., the number of created
fall-throughs), there shouldn't be problems; in fact, I do see some
metrics go up. But it might be hard/impossible to measure perf
difference for such small changes. I did test the performance clang-14
binary and do not record a perf or i-cache-related differences.
My 5 benchmarks, with ext-tsp runtime (the lower the better) and
"tsp-score" (the higher the better).
**Before**:
- benchmark 1:
num functions: 13,047
reordering running time is 2.4 seconds
score: 125503458 (128.3102%)
- benchmark 2:
num functions: 16,438
reordering running time is 3.4 seconds
score: 12613997277 (129.7495%)
- benchmark 3:
num functions: 12,359
reordering running time is 1.9 seconds
score: 1315881613 (105.8991%)
- benchmark 4:
num functions: 96,588
reordering running time is 7.3 seconds
score: 89513906284 (100.3413%)
- benchmark 5:
num functions: 1
reordering running time is 372 seconds
score: 21292505965077 (99.9979%)
- benchmark 6:
num functions: 71,155
reordering running time is 314 seconds
score: 29795381626270671437824 (102.7519%)
**After**:
- benchmark 1:
reordering running time is 2.2 seconds
score: 125510418 (128.3130%)
- benchmark 2:
reordering running time is 2.6 seconds
score: 12614502162 (129.7525%)
- benchmark 3:
reordering running time is 1.6 seconds
score: 1315938168 (105.9024%)
- benchmark 4:
reordering running time is 4.9 seconds
score: 89518095837 (100.3454%)
- benchmark 5:
reordering running time is 4.8 seconds
score: 21292295939119 (99.9971%)
- benchmark 6:
reordering running time is 104 seconds
score: 29796710925310302879744 (102.7565%)
More information about the All-commits
mailing list