[llvm] [MCP] Move dependencies if they block copy propagation (PR #105562)
Gábor Spaits via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 30 04:31:37 PDT 2024
spaits wrote:
I have don yet another benchmarking. Now with each each version the benchmark was compiled five times and for each compilation I was running llvm-lit 4 times.
The following three versions were considered:
- Baseline: 22d3fb182c9199ac3d51e5577c6647508a7a37f0
- DDG for regions: a34e68d6390cf23a88315e3b3527748f7724639c
There is a compile time regression relative to the state when we have done ddg for basic blocks.
Left (Baseline) Right (DDG for regions)
Compile time:
```
python3 ../utils/compare.py -m compile_time resold10.json resold11.json resold12.json resold13.json resold14.json resold15.json resold16.json resold17.json resold18.json resold19.json resold110.json resold111.json resold112.json resold113.json resold114.json resold115.json resold116.json resold117.json resold118.json resold119.json vs resnew10.json resnew11.json resnew12.json resnew13.json resnew14.json resnew15.json resnew16.json resnew17.json resnew18.json resnew19.json resnew110.json resnew111.json resnew112.json resnew113.json resnew114.json resnew115.json resnew116.json resnew117.json resnew118.json resnew119.json
Tests: 10
Metric: compile_time
/home/spaits/repo/llvm-test-suite/build/../utils/compare.py:206: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
name0 = names[0]
Program compile_time
lhs rhs diff
lencod/lencod 42.12 43.52 3.3%
kimwitu++/kc 36.16 37.06 2.5%
tramp3d-v4/tramp3d-v4 35.01 35.65 1.8%
mafft/pairlocalalign 22.52 22.85 1.4%
Bullet/bullet 81.81 82.85 1.3%
7zip/7zip-benchmark 174.25 176.05 1.0%
consumer-typeset/consumer-typeset 27.11 27.33 0.8%
SPASS/SPASS 37.69 37.78 0.3%
ClamAV/clamscan 44.05 44.03 -0.0%
sqlite3/sqlite3 14.02 13.97 -0.3%
Geomean difference 1.2%
compile_time
l/r lhs rhs diff
count 10.000000 10.000000 10.000000
mean 51.474390 52.109240 0.012104
std 46.747508 47.224755 0.011271
min 14.017200 13.974600 -0.003039
25% 29.084400 29.413425 0.003998
50% 36.923200 37.421600 0.011474
75% 43.570300 43.903125 0.017345
max 174.252400 176.048400 0.033065
```
Exec_time:
```
python3 ../utils/compare.py -m exec_time resold10.json resold11.json resold12.json resold13.json resold14.json resold15.json resold16.json resold17.json resold18.json resold19.json resold110.json resold111.json resold112.json resold113.json resold114.json resold115.json resold116.json resold117.json resold118.json resold119.json vs resnew10.json resnew11.json resnew12.json resnew13.json resnew14.json resnew15.json resnew16.json resnew17.json resnew18.json resnew19.json resnew110.json resnew111.json resnew112.json resnew113.json resnew114.json resnew115.json resnew116.json resnew117.json resnew118.json resnew119.json
Tests: 10
Metric: exec_time
/home/spaits/repo/llvm-test-suite/build/../utils/compare.py:206: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
name0 = names[0]
Program exec_time
lhs rhs diff
sqlite3/sqlite3 1.01 1.03 2.4%
SPASS/SPASS 3.50 3.55 1.3%
ClamAV/clamscan 0.05 0.05 1.0%
mafft/pairlocalalign 8.75 8.82 0.8%
lencod/lencod 1.80 1.81 0.7%
tramp3d-v4/tramp3d-v4 0.06 0.06 -0.6%
Bullet/bullet 1.65 1.63 -0.8%
7zip/7zip-benchmark 4.87 4.82 -1.1%
consumer-typeset/consumer-typeset 0.04 0.04 -6.3%
kimwitu++/kc 0.01 0.01 -14.3%
Geomean difference -1.8%
exec_time
l/r lhs rhs diff
count 10.000000 10.000000 10.000000
mean 2.174090 2.182130 -0.016837
std 2.833687 2.847120 0.050182
min 0.007000 0.006000 -0.142857
25% 0.051375 0.051650 -0.009887
50% 1.327900 1.333600 0.000346
75% 3.075775 3.112825 0.009726
max 8.749700 8.816100 0.023854
```
Size of text section:
```
python3 ../utils/compare.py -m size..text resold10.json resold11.json resold12.json resold13.json resold14.json resold15.json resold16.json resold17.json resold18.json resold19.json resold110.json resold111.json resold112.json resold113.json resold114.json resold115.json resold116.json resold117.json resold118.json resold119.json vs resnew10.json resnew11.json resnew12.json resnew13.json resnew14.json resnew15.json resnew16.json resnew17.json resnew18.json resnew19.json resnew110.json resnew111.json resnew112.json resnew113.json resnew114.json resnew115.json resnew116.json resnew117.json resnew118.json resnew119.json
Tests: 10
Metric: size..text
/home/spaits/repo/llvm-test-suite/build/../utils/compare.py:206: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
name0 = names[0]
Program size..text
lhs rhs diff
lencod/lencod 763725.00 763725.00 0.0%
mafft/pairlocalalign 452119.00 452103.00 -0.0%
kimwitu++/kc 403971.00 403955.00 -0.0%
tramp3d-v4/tramp3d-v4 884131.00 884083.00 -0.0%
Bullet/bullet 726622.00 726574.00 -0.0%
consumer-typeset/consumer-typeset 442577.00 442529.00 -0.0%
7zip/7zip-benchmark 907121.00 906945.00 -0.0%
ClamAV/clamscan 540338.00 540178.00 -0.0%
sqlite3/sqlite3 490639.00 490447.00 -0.0%
SPASS/SPASS 505890.00 505586.00 -0.1%
Geomean difference -0.0%
size..text
l/r lhs rhs diff
count 10.000000 10.00000 10.000000
mean 611713.300000 611612.50000 -0.000179
std 190314.056507 190320.06691 0.000195
min 403971.000000 403955.00000 -0.000601
25% 461749.000000 461689.00000 -0.000271
50% 523114.000000 522882.00000 -0.000087
75% 754449.250000 754437.25000 -0.000043
max 907121.000000 906945.00000 0.000000
```
I have done these benchmarks on my laptop.
I have a fairly strong laptop with an 12th Gen Intel i7-1265U (12) @ 4.800GHz and 32GBs of ram, but I don't think I can measure stuff like compile time and runtime effectively. Just the heat of the laptop can add or remove whole percents from the compile time and exec time. I tried to conduct these measurements in a way so each benchmark session begins at the same state but it is really hard.
When running `llvm-lit` only the runtime results change. The compile time is decided when running `ninja`.
So basically when comparing exec time, then there was really 20 result merged and compared, but when comparing the compile time results there was rather just 5 results merged and compared. So basically after the code size, which is a fairly static thing (you can a code with compiler 100 time with a the same compiler you get the same code size each time) the execution time is the most accurate result.
I don't have better equipment right now. Also I have only checked the x86-64 target that is not that prone to patterns like the one addressed by this patch. I think the best would be to try this out on arm or riscv and do the compilation in a more consistent environment.
### Also one more possible improvement:
Since I only build the ddg when needed, now we don't have to deal with register renames after the ddg build so this whole thing can be done in one stage again.
I will try that that may reduce compile time further.
https://github.com/llvm/llvm-project/pull/105562
More information about the llvm-commits
mailing list