[PATCH] Implement ADRP CSE for global symbols
Quentin Colombet
qcolombet at apple.com
Wed May 14 19:18:31 PDT 2014
Hi Jiangning,
Sorry for the delay to get back to you but the results are not what I was expected and I had to double check them.
So, the short story is, I am seeing the following a lot of regressions, although the number of ADRP/ADR instructions decreased a lot in the final binaries.
Note: for the performances numbers, I have filtered out the tests that run for less than a second and the tests that have similar performance:
Columns:
- Reference: Global merge on external disabled.
- Test: Global merge on external enabled as well as the alignment thing.
- Expension: Test/Reference (Smaller is better).
### Performances ###
** O3 **
Benchmark_ID Reference Test Expansion Percent
-------------------------------------------------------------------------------
ASC_Sequoia/IRSmk/IRSmk 16.6782 16.5785 0.99 -1%
Adobe-C++/loop_unroll 4.9308 5.0514 1.02 +2%
BenchmarkGame/n-body 2.38 2.3966 1.01 +1%
CFP2000/177.mesa/177.me 4.0972 4.1418 1.01 +1%
CINT2000/164.gzip/164.g 22.8601 23.2677 1.02 +2%
CINT2000/186.crafty/186 9.1118 9.2452 1.01 +1%
CINT2000/253.perlbmk/25 13.5299 13.6566 1.01 +1%
CINT2000/254.gap/254.ga 3.8361 3.8128 0.99 -1%
CINT2000/255.vortex/255 4.8632 5.0017 1.03 +3%
CINT2006/403.gcc/403.gc 3.251 3.2304 0.99 -1%
CINT2006/458.sjeng/458. 8.8006 8.7538 0.99 -1%
CINT2006/471.omnetpp/47 1.3495 1.323 0.98 -2%
McGill/queens 4.2418 4.2123 0.99 -1%
Misc-C++/Large/ray 5.2962 5.3265 1.01 +1%
Olden/power/power 2.3012 2.3335 1.01 +1%
SIBsim4/SIBsim4 5.6249 5.5846 0.99 -1%
Shootout-C++/ary3 2.1923 2.2114 1.01 +1%
Shootout-C++/lists1 1.4118 1.4452 1.02 +2%
VersaBench/8b10b/8b10b 13.3247 12.9642 0.97 -3%
VersaBench/ecbdes/ecbde 4.9098 4.9364 1.01 +1%
aha/aha 4.2497 4.3452 1.02 +2%
lambda-0.1.3/lambda 8.8454 8.7103 0.98 -2%
mafft/pairlocalalign 59.696 57.5007 0.96 -4%
siod/siod 3.4691 3.4929 1.01 +1%
-------------------------------------------------------------------------------
Min (24) - - 0.96 -
-------------------------------------------------------------------------------
Max (24) - - 1.03 -
-------------------------------------------------------------------------------
Sum (24) 211 210 0.99 +1%
-------------------------------------------------------------------------------
A.Mean (24) - - 1 +0%
-------------------------------------------------------------------------------
G.Mean 2 (24) - - 1 +0%
-------------------------------------------------------------------------------
Overall there are more regressions (14) than improvements (10) and on average it is neutral.
** Os **
Benchmark_ID Reference Test Expansion Percent
-------------------------------------------------------------------------------
Adobe-C++/loop_unroll 5.0411 5.168 1.03 +3%
CINT2000/164.gzip/164.g 22.5516 23.0424 1.02 +2%
CINT2000/186.crafty/186 9.5044 9.6343 1.01 +1%
CINT2006/400.perlbench/ 15.0506 15.1874 1.01 +1%
CINT2006/456.hmmer/456. 6.088 6.1583 1.01 +1%
CINT2006/462.libquantum 2.6566 2.678 1.01 +1%
McGill/queens 4.2193 4.186 0.99 -1%
Misc-C++/Large/ray 6.5632 6.6263 1.01 +1%
Olden/power/power 3.4379 3.2233 0.94 -6%
Polybench/stencils/fdtd 3.6964 3.718 1.01 +1%
Ptrdist/ft/ft 2.3729 2.3966 1.01 +1%
SIBsim4/SIBsim4 5.6219 5.6644 1.01 +1%
Trimaran/enc-3des/enc-3 3.7445 3.7256 0.99 -1%
Trimaran/enc-pc1/enc-pc 1.6924 1.7176 1.01 +1%
VersaBench/8b10b/8b10b 13.4432 13.1539 0.98 -2%
lambda-0.1.3/lambda 8.681 9.1763 1.06 +6%
mafft/pairlocalalign 59.1719 58.6067 0.99 -1%
povray 6.2992 6.4061 1.02 +2%
siod/siod 4.2536 4.3336 1.02 +2%
-------------------------------------------------------------------------------
Min (19) - - 0.94 -
-------------------------------------------------------------------------------
Max (19) - - 1.06 -
-------------------------------------------------------------------------------
Sum (19) 184 185 1 +0%
-------------------------------------------------------------------------------
A.Mean (19) - - 1.01 +1%
-------------------------------------------------------------------------------
G.Mean 2 (19) - - 1.01 +1%
-------------------------------------------------------------------------------
Overall there are more regressions (14) than improvements (5) and on average it is a regression.
### Static Count of ADRP/ADR ###
This is the number of ADRP/ADR instruction in the final binaries. In other words, the linker optimizations already took place.
** O3 **
-------------------------------------------------------------------------------
Min (90) - - 0.3 -
-------------------------------------------------------------------------------
Max (90) - - 1.11 -
-------------------------------------------------------------------------------
Sum (90) 160078 152264 0.95 +5%
-------------------------------------------------------------------------------
A.Mean (90) - - 0.88 -12%
-------------------------------------------------------------------------------
G.Mean 2 (90) - - 0.86 -14%
-------------------------------------------------------------------------------
Here are the details for the regressions:
Benchmark_ID Reference Test Expansion Percent
-------------------------------------------------------------------------------
C/Output/globalrefs.sim 9 10 1.11 +11%
CFP2006/444.namd/Output 431 444 1.03 +3%
Misc/Output/flops-1.sim 16 17 1.06 +6%
Misc/Output/flops-3.sim 15 16 1.07 +7%
Misc/Output/flops-8.sim 15 16 1.07 +7%
Shootout-C++/EH/Output/ 16 17 1.06 +6%
-------------------------------------------------------------------------------
Min (6) - - 1.03 -
-------------------------------------------------------------------------------
Max (6) - - 1.11 -
-------------------------------------------------------------------------------
Sum (6) 502 520 1.04 -3%
-------------------------------------------------------------------------------
A.Mean (6) - - 1.07 +7%
-------------------------------------------------------------------------------
G.Mean 2 (6) - - 1.07 +7%
-------------------------------------------------------------------------------
** Os **
-------------------------------------------------------------------------------
Min (87) - - 0.3 -
-------------------------------------------------------------------------------
Max (87) - - 1.11 -
-------------------------------------------------------------------------------
Sum (87) 101239 96762 0.96 +5%
-------------------------------------------------------------------------------
A.Mean (87) - - 0.89 -11%
-------------------------------------------------------------------------------
G.Mean 2 (87) - - 0.86 -14%
-------------------------------------------------------------------------------
Here are the details for the regressions:
Benchmark_ID Reference Test Expansion Percent
-------------------------------------------------------------------------------
C/Output/globalrefs.sim 9 10 1.11 +11%
CFP2006/444.namd/Output 349 385 1.1 +10%
Misc/Output/flops-1.sim 16 17 1.06 +6%
Misc/Output/flops-3.sim 15 16 1.07 +7%
Misc/Output/flops-8.sim 15 16 1.07 +7%
Shootout-C++/EH/Output/ 10 11 1.1 +10%
-------------------------------------------------------------------------------
Min (6) - - 1.06 -
-------------------------------------------------------------------------------
Max (6) - - 1.11 -
-------------------------------------------------------------------------------
Sum (6) 414 455 1.1 -9%
-------------------------------------------------------------------------------
A.Mean (6) - - 1.09 +9%
-------------------------------------------------------------------------------
G.Mean 2 (6) - - 1.08 +8%
-------------------------------------------------------------------------------
### Long Story ###
After the first run, a lot of applications were regressing, so I suspected that somehow my device got noisy. I've rerun all the regressions/improvements 10x times and observed the standard deviation (SD). The SD was close to 0 percent for most tests, i.e., the regressions/improvements were real.
I've investigated a few of them and figured it was our lowering using pseudo instruction for ADRP that was bitting us. The short story is this was producing redundant ADRPs as you initially saw when you started to work on that patch.
I thought I have fixed the problem with a quick patch prior to the first round of experiments, but apparently, it was not sufficient.
Anyway, I have made another quick patch where I basically blocked any folding of those instructions, thus maximizing the reuse of ADRPs. This is the current baseline.
However, like the numbers show, even if we remove a lot of ADRPs, there is something else going on that make this worthless.
I have to dig into that to see what we can do.
In the meantime, I can give you the raw numbers if you want to gather your own statistics.
Thanks,
-Quentin
http://reviews.llvm.org/D3432
More information about the llvm-commits
mailing list