[llvm] [Codegen] (NFC) Faster algorithm for MachineBlockPlacement (PR #91843)
William Junda Huang via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 4 19:51:11 PDT 2024
huangjd wrote:
Some performance data, measured in an internal large proto (with 700 fields)
Before:
```
===-------------------------------------------------------------------------===
Pass execution timing report
===-------------------------------------------------------------------------===
Total Execution Time: 77.7118 seconds (77.7120 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
18.7052 ( 27.6%) 0.0006 ( 0.0%) 18.7059 ( 24.1%) 18.7067 ( 24.1%) Branch Probability Basic Block Placement
12.0425 ( 17.8%) 0.1802 ( 1.8%) 12.2227 ( 15.7%) 12.2205 ( 15.7%) Loop Strength Reduction
10.9024 ( 16.1%) 0.2040 ( 2.1%) 11.1064 ( 14.3%) 11.1069 ( 14.3%) Greedy Register Allocator #2
2.3942 ( 3.5%) 7.3894 ( 74.5%) 9.7835 ( 12.6%) 9.7841 ( 12.6%) X86 Assembly Printer
5.8601 ( 8.6%) 1.9305 ( 19.5%) 7.7906 ( 10.0%) 7.7911 ( 10.0%) X86 DAG->DAG Instruction Selection
2.8717 ( 4.2%) 0.0323 ( 0.3%) 2.9040 ( 3.7%) 2.9040 ( 3.7%) Live DEBUG_VALUE analysis
2.7199 ( 4.0%) 0.0010 ( 0.0%) 2.7209 ( 3.5%) 2.7210 ( 3.5%) Machine Instruction Scheduler
2.4429 ( 3.6%) 0.0004 ( 0.0%) 2.4433 ( 3.1%) 2.4434 ( 3.1%) Register Coalescer
0.8445 ( 1.2%) 0.0001 ( 0.0%) 0.8446 ( 1.1%) 0.8447 ( 1.1%) Machine Cycle Info Analysis
0.7191 ( 1.1%) 0.0003 ( 0.0%) 0.7194 ( 0.9%) 0.7195 ( 0.9%) Control Flow Optimizer
...
```
After
```
===-------------------------------------------------------------------------===
Pass execution timing report
===-------------------------------------------------------------------------===
Total Execution Time: 67.9013 seconds (67.9011 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
12.7974 ( 22.3%) 0.1834 ( 1.7%) 12.9808 ( 19.1%) 12.9781 ( 19.1%) Loop Strength Reduction
11.1578 ( 19.4%) 0.2416 ( 2.3%) 11.3993 ( 16.8%) 11.3999 ( 16.8%) Greedy Register Allocator #2
2.5642 ( 4.5%) 7.6675 ( 73.0%) 10.2317 ( 15.1%) 10.2333 ( 15.1%) X86 Assembly Printer
6.0531 ( 10.5%) 2.2174 ( 21.1%) 8.2706 ( 12.2%) 8.2712 ( 12.2%) X86 DAG->DAG Instruction Selection
5.0775 ( 8.8%) 0.0004 ( 0.0%) 5.0779 ( 7.5%) 5.0781 ( 7.5%) Branch Probability Basic Block Placement
3.3444 ( 5.8%) 0.0234 ( 0.2%) 3.3678 ( 5.0%) 3.3679 ( 5.0%) Live DEBUG_VALUE analysis
2.7316 ( 4.8%) 0.0013 ( 0.0%) 2.7329 ( 4.0%) 2.7331 ( 4.0%) Machine Instruction Scheduler
2.2701 ( 4.0%) 0.0006 ( 0.0%) 2.2707 ( 3.3%) 2.2709 ( 3.3%) Register Coalescer
0.8733 ( 1.5%) 0.0004 ( 0.0%) 0.8738 ( 1.3%) 0.8738 ( 1.3%) Control Flow Optimizer
0.8678 ( 1.5%) 0.0002 ( 0.0%) 0.8680 ( 1.3%) 0.8681 ( 1.3%) Machine Cycle Info Analysis
...
```
I can't find a open-source proto with non-trivial amount of fields, or any source file with similar structure where it contains many loops in a sequence, which is the use case for this optimization.
https://github.com/llvm/llvm-project/pull/91843
More information about the llvm-commits
mailing list