[PATCH] Implement ADRP CSE for global symbols

Wed May 14 19:18:31 PDT 2014

Hi Jiangning,

Sorry for the delay to get back to you but the results are not what I was expected and I had to double check them.
So, the short story is, I am seeing the following a lot of regressions, although the number of ADRP/ADR instructions decreased a lot in the final binaries.

Note: for the performances numbers, I have filtered out the tests that run for less than a second and the tests that have similar performance:
Columns:
- Reference: Global merge on external disabled.
- Test: Global merge on external enabled as well as the alignment thing.
- Expension: Test/Reference (Smaller is better).

### Performances ###

** O3 **

Benchmark_ID    	Reference	Test    	Expansion 	Percent
-------------------------------------------------------------------------------
ASC_Sequoia/IRSmk/IRSmk	      16.6782	      16.5785	    0.99	    -1%
Adobe-C++/loop_unroll  	       4.9308	       5.0514	    1.02	    +2%
BenchmarkGame/n-body   	         2.38	       2.3966	    1.01	    +1%
CFP2000/177.mesa/177.me	       4.0972	       4.1418	    1.01	    +1%
CINT2000/164.gzip/164.g	      22.8601	      23.2677	    1.02	    +2%
CINT2000/186.crafty/186	       9.1118	       9.2452	    1.01	    +1%
CINT2000/253.perlbmk/25	      13.5299	      13.6566	    1.01	    +1%
CINT2000/254.gap/254.ga	       3.8361	       3.8128	    0.99	    -1%
CINT2000/255.vortex/255	       4.8632	       5.0017	    1.03	    +3%
CINT2006/403.gcc/403.gc	        3.251	       3.2304	    0.99	    -1%
CINT2006/458.sjeng/458.	       8.8006	       8.7538	    0.99	    -1%
CINT2006/471.omnetpp/47	       1.3495	        1.323	    0.98	    -2%
McGill/queens          	       4.2418	       4.2123	    0.99	    -1%
Misc-C++/Large/ray     	       5.2962	       5.3265	    1.01	    +1%
Olden/power/power      	       2.3012	       2.3335	    1.01	    +1%
SIBsim4/SIBsim4        	       5.6249	       5.5846	    0.99	    -1%
Shootout-C++/ary3      	       2.1923	       2.2114	    1.01	    +1%
Shootout-C++/lists1    	       1.4118	       1.4452	    1.02	    +2%
VersaBench/8b10b/8b10b 	      13.3247	      12.9642	    0.97	    -3%
VersaBench/ecbdes/ecbde	       4.9098	       4.9364	    1.01	    +1%
aha/aha                	       4.2497	       4.3452	    1.02	    +2%
lambda-0.1.3/lambda    	       8.8454	       8.7103	    0.98	    -2%
mafft/pairlocalalign   	       59.696	      57.5007	    0.96	    -4%
siod/siod              	       3.4691	       3.4929	    1.01	    +1%
-------------------------------------------------------------------------------
Min (24)               	            -	            -	    0.96	      -
-------------------------------------------------------------------------------
Max (24)               	            -	            -	    1.03	      -
-------------------------------------------------------------------------------
Sum (24)               	          211	          210	    0.99	    +1%
-------------------------------------------------------------------------------
A.Mean (24)            	            -	            -	       1	    +0%
-------------------------------------------------------------------------------
G.Mean 2 (24)          	            -	            -	       1	    +0%
-------------------------------------------------------------------------------

Overall there are more regressions (14) than improvements (10) and on average it is neutral.

** Os **

Benchmark_ID    	Reference	Test    	Expansion 	Percent
-------------------------------------------------------------------------------
Adobe-C++/loop_unroll  	       5.0411	        5.168	    1.03	    +3%
CINT2000/164.gzip/164.g	      22.5516	      23.0424	    1.02	    +2%
CINT2000/186.crafty/186	       9.5044	       9.6343	    1.01	    +1%
CINT2006/400.perlbench/	      15.0506	      15.1874	    1.01	    +1%
CINT2006/456.hmmer/456.	        6.088	       6.1583	    1.01	    +1%
CINT2006/462.libquantum	       2.6566	        2.678	    1.01	    +1%
McGill/queens          	       4.2193	        4.186	    0.99	    -1%
Misc-C++/Large/ray     	       6.5632	       6.6263	    1.01	    +1%
Olden/power/power      	       3.4379	       3.2233	    0.94	    -6%
Polybench/stencils/fdtd	       3.6964	        3.718	    1.01	    +1%
Ptrdist/ft/ft          	       2.3729	       2.3966	    1.01	    +1%
SIBsim4/SIBsim4        	       5.6219	       5.6644	    1.01	    +1%
Trimaran/enc-3des/enc-3	       3.7445	       3.7256	    0.99	    -1%
Trimaran/enc-pc1/enc-pc	       1.6924	       1.7176	    1.01	    +1%
VersaBench/8b10b/8b10b 	      13.4432	      13.1539	    0.98	    -2%
lambda-0.1.3/lambda    	        8.681	       9.1763	    1.06	    +6%
mafft/pairlocalalign   	      59.1719	      58.6067	    0.99	    -1%
povray                 	       6.2992	       6.4061	    1.02	    +2%
siod/siod              	       4.2536	       4.3336	    1.02	    +2%
-------------------------------------------------------------------------------
Min (19)               	            -	            -	    0.94	      -
-------------------------------------------------------------------------------
Max (19)               	            -	            -	    1.06	      -
-------------------------------------------------------------------------------
Sum (19)               	          184	          185	       1	    +0%
-------------------------------------------------------------------------------
A.Mean (19)            	            -	            -	    1.01	    +1%
-------------------------------------------------------------------------------
G.Mean 2 (19)          	            -	            -	    1.01	    +1%
-------------------------------------------------------------------------------

Overall there are more regressions (14) than improvements (5) and on average it is a regression.

### Static Count of ADRP/ADR ###

This is the number of ADRP/ADR instruction in the final binaries. In other words, the linker optimizations already took place.

** O3 **
-------------------------------------------------------------------------------
Min (90)               	            -	            -	     0.3	      -
-------------------------------------------------------------------------------
Max (90)               	            -	            -	    1.11	      -
-------------------------------------------------------------------------------
Sum (90)               	       160078	       152264	    0.95	    +5%
-------------------------------------------------------------------------------
A.Mean (90)            	            -	            -	    0.88	   -12%
-------------------------------------------------------------------------------
G.Mean 2 (90)          	            -	            -	    0.86	   -14%
-------------------------------------------------------------------------------

Here are the details for the regressions:
Benchmark_ID    	Reference	Test    	Expansion 	Percent
-------------------------------------------------------------------------------
C/Output/globalrefs.sim	            9	           10	    1.11	   +11%
CFP2006/444.namd/Output	          431	          444	    1.03	    +3%
Misc/Output/flops-1.sim	           16	           17	    1.06	    +6%
Misc/Output/flops-3.sim	           15	           16	    1.07	    +7%
Misc/Output/flops-8.sim	           15	           16	    1.07	    +7%
Shootout-C++/EH/Output/	           16	           17	    1.06	    +6%
-------------------------------------------------------------------------------
Min (6)                	            -	            -	    1.03	      -
-------------------------------------------------------------------------------
Max (6)                	            -	            -	    1.11	      -
-------------------------------------------------------------------------------
Sum (6)                	          502	          520	    1.04	    -3%
-------------------------------------------------------------------------------
A.Mean (6)             	            -	            -	    1.07	    +7%
-------------------------------------------------------------------------------
G.Mean 2 (6)           	            -	            -	    1.07	    +7%
-------------------------------------------------------------------------------

** Os **
-------------------------------------------------------------------------------
Min (87)               	            -	            -	     0.3	      -
-------------------------------------------------------------------------------
Max (87)               	            -	            -	    1.11	      -
-------------------------------------------------------------------------------
Sum (87)               	       101239	        96762	    0.96	    +5%
-------------------------------------------------------------------------------
A.Mean (87)            	            -	            -	    0.89	   -11%
-------------------------------------------------------------------------------
G.Mean 2 (87)          	            -	            -	    0.86	   -14%
-------------------------------------------------------------------------------

Here are the details for the regressions:
Benchmark_ID    	Reference	Test    	Expansion 	Percent
-------------------------------------------------------------------------------
C/Output/globalrefs.sim	            9	           10	    1.11	   +11%
CFP2006/444.namd/Output	          349	          385	     1.1	   +10%
Misc/Output/flops-1.sim	           16	           17	    1.06	    +6%
Misc/Output/flops-3.sim	           15	           16	    1.07	    +7%
Misc/Output/flops-8.sim	           15	           16	    1.07	    +7%
Shootout-C++/EH/Output/	           10	           11	     1.1	   +10%
-------------------------------------------------------------------------------
Min (6)                	            -	            -	    1.06	      -
-------------------------------------------------------------------------------
Max (6)                	            -	            -	    1.11	      -
-------------------------------------------------------------------------------
Sum (6)                	          414	          455	     1.1	    -9%
-------------------------------------------------------------------------------
A.Mean (6)             	            -	            -	    1.09	    +9%
-------------------------------------------------------------------------------
G.Mean 2 (6)           	            -	            -	    1.08	    +8%
-------------------------------------------------------------------------------

### Long Story ###

After the first run, a lot of applications were regressing, so I suspected that somehow my device got noisy. I've rerun all the regressions/improvements 10x times and observed the standard deviation (SD). The SD was close to 0 percent for most tests, i.e., the regressions/improvements were real.

I've investigated a few of them and figured it was our lowering using pseudo instruction for ADRP that was bitting us. The short story is this was producing redundant ADRPs as you initially saw when you started to work on that patch.
I thought I have fixed the problem with a quick patch prior to the first round of experiments, but apparently, it was not sufficient.

Anyway, I have made another quick patch where I basically blocked any folding of those instructions, thus maximizing the reuse of ADRPs. This is the current baseline.
However, like the numbers show, even if we remove a lot of ADRPs, there is something else going on that make this worthless.
I have to dig into that to see what we can do.

In the meantime, I can give you the raw numbers if you want to gather your own statistics.

Thanks,

-Quentin

http://reviews.llvm.org/D3432