[PATCH] D156491: [RA] Split a virtual register in cold blocks if it is not assigned preferred physical register

Thu Aug 10 17:41:01 PDT 2023

Carrot added a comment.

In D156491#4540597 <https://reviews.llvm.org/D156491#4540597>, @MatzeB wrote:

> Interesting! This is showing some neat improvements enabling more shrink-wrapping in the test-cases.
>
> Though I suspect changes of the splitting algorithm could trigger regressions (given it's all heuristics). So I would feel better if this change was backed up by more statistics. (Something like compiling llvm-test-suite and collecting regalloc stats or remarks?) and/or report any other benchmarks you did?

Following are regalloc stats when compiler SPECINT2006 with FDO

          stats                                            old       new
  Number of copies inserted for splitting                103020    121524
  Number of splits finished                               19248     24346
  Number of split global live ranges                      11068     16305
  Number of new live ranges queued                       157359    172528
  Number of rematerialized defs for splitting              2756      5536
  Number of splits that were simple                        8605     12145

  Number of registers assigned                           649170    663979
  Number of instructions deleted by DCE                   83051     84678
  Number of interferences evicted                         21432     21320
  Number of folded stack accesses                         10992     10914
  Number of folded loads                                    623       624
  Number of live ranges fractured by DCE                    406       390
  Number of identity moves eliminated after rewriting    232947    240607
  Number of dead lane conflicts tested                     7451      7503
  Number of dead lane conflicts resolved                   3943      3957
  Number of split local live ranges                        2280      2230
  Number of instructions rematerialized                  122540    126312
  Number of instructions re-materialized                  93828     94520
  Number of reloads inserted                              50031     51344
  Number of reloads removed                                 827       787
  Number of rematerialized defs for spilling              25833     26237
  Number of shrinkToUses called                           94489     95202
  Number of spilled snippets                               1083      1100
  Number of spill slots allocated                         15944     16053
  Number of spilled live ranges                           28650     28776
  Number of spills inserted                               22428     22448
  Number of spills removed                                  878       778
  Number of registers unassigned                          26406     26478
  Number of instruction commuting performed                 640       650
  Number of cross class joins performed                  159091    161155
  Number of copies extended                                  15        14
  Number of interval joins performed                     508329    506998
  Number of register classes inflated                         4         4
  Number of single use loads folded after DCE                32        32

The first 6 stats are much larger than old values, these are direct results of the new splitting. Other stats numbers are not significantly impacted.

The runtime performance of SPECINT2006 on my desktop is

  without this patch
  400.perlbench                                    9770        227       43.1 *
  401.bzip2                                        9650        370       26.1 *
  403.gcc                                          8050        194       41.6 *
  429.mcf                                          9120        198       46.1 *
  445.gobmk                                       10490        342       30.7 *
  456.hmmer                                        9330        246       37.9 *
  458.sjeng                                       12100        372       32.5 *
  462.libquantum                                  20720        297       69.9 *
  464.h264ref                                     22130        328       67.5 *
  471.omnetpp                                      6250        233       26.8 *
  473.astar                                        7020        294       23.9 *
  483.xalancbmk                                    6900        159       43.4 *
   Est. SPECint(R)_base2006           Not Run
   Est. SPECint2006                                                      38.5

  with this patch
  400.perlbench                                    9770        227       43.0 *
  401.bzip2                                        9650        373       25.8 *
  403.gcc                                          8050        191       42.1 *
  429.mcf                                          9120        200       45.6 *
  445.gobmk                                       10490        346       30.3 *
  456.hmmer                                        9330        249       37.5 *
  458.sjeng                                       12100        376       32.2 *
  462.libquantum                                  20720        310       66.8 *
  464.h264ref                                     22130        324       68.2 *
  471.omnetpp                                      6250        232       27.0 *
  473.astar                                        7020        280       25.1 *
  483.xalancbmk                                    6900        158       43.7 *
   Est. SPECint(R)_base2006           Not Run
   Est. SPECint2006                                                      38.5

The final scores are same, but 462.libquantum and 473.astar have big difference. I double checked them.

462.libquantum
It has big variation for each run. Because this patch mainly impact the dynamic number of move instructions, I checked the executed instructions for two version, they are 2314346841712 vs 2314346834848, basically no difference. I also checked the hottest 6 functions, they are same in both versions. So there is no regression in 462.libquantum.

473.astar
This benchmark has a very stable run time. The performance difference is consistently reproduced. I also checked the dynamic instruction numbers, the new version has less instructions executed. So this improvement is real.

  input                    old             new
  BigLakes2048.cfg    334603128498    322906773354
  rivers.cfg          663093493368    653682859787

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156491/new/

https://reviews.llvm.org/D156491