[llvm] [MCP] Move dependencies if they block copy propagation (PR #105562)

Fri Aug 30 04:31:37 PDT 2024

spaits wrote:

I have don yet another benchmarking. Now with each each version the benchmark was compiled five times and for each compilation I was running llvm-lit 4 times.

The following three versions were considered:
- Baseline: 22d3fb182c9199ac3d51e5577c6647508a7a37f0
- DDG for regions: a34e68d6390cf23a88315e3b3527748f7724639c

There is a compile time regression relative to the state when we have done ddg for basic blocks.

Left (Baseline) Right (DDG for regions)

Compile time:
```
python3 ../utils/compare.py -m compile_time resold10.json resold11.json resold12.json resold13.json resold14.json resold15.json resold16.json resold17.json resold18.json resold19.json resold110.json resold111.json resold112.json resold113.json resold114.json resold115.json resold116.json resold117.json resold118.json resold119.json  vs resnew10.json resnew11.json resnew12.json resnew13.json resnew14.json resnew15.json resnew16.json resnew17.json resnew18.json resnew19.json resnew110.json resnew111.json resnew112.json resnew113.json resnew114.json resnew115.json resnew116.json resnew117.json resnew118.json resnew119.json
Tests: 10
Metric: compile_time

/home/spaits/repo/llvm-test-suite/build/../utils/compare.py:206: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  name0 = names[0]
Program                                       compile_time             
                                              lhs          rhs    diff 
lencod/lencod                                  42.12        43.52  3.3%
kimwitu++/kc                                   36.16        37.06  2.5%
tramp3d-v4/tramp3d-v4                          35.01        35.65  1.8%
mafft/pairlocalalign                           22.52        22.85  1.4%
Bullet/bullet                                  81.81        82.85  1.3%
7zip/7zip-benchmark                           174.25       176.05  1.0%
consumer-typeset/consumer-typeset              27.11        27.33  0.8%
SPASS/SPASS                                    37.69        37.78  0.3%
ClamAV/clamscan                                44.05        44.03 -0.0%
sqlite3/sqlite3                                14.02        13.97 -0.3%
                           Geomean difference                      1.2%
      compile_time                       
l/r            lhs         rhs       diff
count  10.000000    10.000000   10.000000
mean   51.474390    52.109240   0.012104 
std    46.747508    47.224755   0.011271 
min    14.017200    13.974600  -0.003039 
25%    29.084400    29.413425   0.003998 
50%    36.923200    37.421600   0.011474 
75%    43.570300    43.903125   0.017345 
max    174.252400   176.048400  0.033065
```

Exec_time:
```
python3 ../utils/compare.py -m exec_time resold10.json resold11.json resold12.json resold13.json resold14.json resold15.json resold16.json resold17.json resold18.json resold19.json resold110.json resold111.json resold112.json resold113.json resold114.json resold115.json resold116.json resold117.json resold118.json resold119.json  vs resnew10.json resnew11.json resnew12.json resnew13.json resnew14.json resnew15.json resnew16.json resnew17.json resnew18.json resnew19.json resnew110.json resnew111.json resnew112.json resnew113.json resnew114.json resnew115.json resnew116.json resnew117.json resnew118.json resnew119.json
Tests: 10
Metric: exec_time

/home/spaits/repo/llvm-test-suite/build/../utils/compare.py:206: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  name0 = names[0]
Program                                       exec_time              
                                              lhs       rhs    diff  
sqlite3/sqlite3                                 1.01      1.03   2.4%
SPASS/SPASS                                     3.50      3.55   1.3%
ClamAV/clamscan                                 0.05      0.05   1.0%
mafft/pairlocalalign                            8.75      8.82   0.8%
lencod/lencod                                   1.80      1.81   0.7%
tramp3d-v4/tramp3d-v4                           0.06      0.06  -0.6%
Bullet/bullet                                   1.65      1.63  -0.8%
7zip/7zip-benchmark                             4.87      4.82  -1.1%
consumer-typeset/consumer-typeset               0.04      0.04  -6.3%
kimwitu++/kc                                    0.01      0.01 -14.3%
                           Geomean difference                   -1.8%
       exec_time                      
l/r          lhs        rhs       diff
count  10.000000  10.000000  10.000000
mean   2.174090   2.182130  -0.016837 
std    2.833687   2.847120   0.050182 
min    0.007000   0.006000  -0.142857 
25%    0.051375   0.051650  -0.009887 
50%    1.327900   1.333600   0.000346 
75%    3.075775   3.112825   0.009726 
max    8.749700   8.816100   0.023854
```

Size of text section:
```
python3 ../utils/compare.py -m size..text resold10.json resold11.json resold12.json resold13.json resold14.json resold15.json resold16.json resold17.json resold18.json resold19.json resold110.json resold111.json resold112.json resold113.json resold114.json resold115.json resold116.json resold117.json resold118.json resold119.json  vs resnew10.json resnew11.json resnew12.json resnew13.json resnew14.json resnew15.json resnew16.json resnew17.json resnew18.json resnew19.json resnew110.json resnew111.json resnew112.json resnew113.json resnew114.json resnew115.json resnew116.json resnew117.json resnew118.json resnew119.json
Tests: 10
Metric: size..text

/home/spaits/repo/llvm-test-suite/build/../utils/compare.py:206: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  name0 = names[0]
Program                                       size..text                
                                              lhs        rhs       diff 
lencod/lencod                                 763725.00  763725.00  0.0%
mafft/pairlocalalign                          452119.00  452103.00 -0.0%
kimwitu++/kc                                  403971.00  403955.00 -0.0%
tramp3d-v4/tramp3d-v4                         884131.00  884083.00 -0.0%
Bullet/bullet                                 726622.00  726574.00 -0.0%
consumer-typeset/consumer-typeset             442577.00  442529.00 -0.0%
7zip/7zip-benchmark                           907121.00  906945.00 -0.0%
ClamAV/clamscan                               540338.00  540178.00 -0.0%
sqlite3/sqlite3                               490639.00  490447.00 -0.0%
SPASS/SPASS                                   505890.00  505586.00 -0.1%
                           Geomean difference                      -0.0%
          size..text                         
l/r              lhs           rhs       diff
count  10.000000      10.00000      10.000000
mean   611713.300000  611612.50000 -0.000179 
std    190314.056507  190320.06691  0.000195 
min    403971.000000  403955.00000 -0.000601 
25%    461749.000000  461689.00000 -0.000271 
50%    523114.000000  522882.00000 -0.000087 
75%    754449.250000  754437.25000 -0.000043 
max    907121.000000  906945.00000  0.000000
```

I have done these benchmarks on my laptop.
I have a fairly strong laptop with an 12th Gen Intel i7-1265U (12) @ 4.800GHz and 32GBs of ram, but I don't think I can measure stuff like compile time and runtime effectively. Just the heat of the laptop can add or remove whole percents from the compile time and exec time. I tried to conduct these measurements in a way so each benchmark session begins at the same state but it is really hard.
When running `llvm-lit` only the runtime results change. The compile time is decided when running `ninja`.
So basically when comparing exec time, then there was really 20 result merged and compared, but when comparing the compile time results there was rather just 5 results merged and compared. So basically after the code size, which is a fairly static thing (you can a code with compiler 100 time with a the same compiler you get the same code size each time) the execution time is the most accurate result.

I don't have better equipment right now. Also I have only checked the x86-64 target that is not that prone to patterns like the one addressed by this patch. I think the best would be to try this out on arm or riscv and do the compilation in a more consistent environment.

### Also one more possible improvement:
Since I only build the ddg when needed, now we don't have to deal with register renames after the ddg build so this whole thing can be done in one stage again.
I will try that that may reduce compile time further.

https://github.com/llvm/llvm-project/pull/105562