[PATCH] D146425: [SystemZ] Enable AtomicExpandPass for i128

Jonas Paulsson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 4 02:54:55 PDT 2023


jonpa added a comment.

With full benchmark runs on a few tests I see:

  Improvements "3/50":
  0.994: i523.xalancbmk_r 
  
  Regressions "3/50":
  1.005: f526.blender_r 
  1.004: i502.gcc_r 
  1.003: i505.mcf_r 
  1.002: f507.cactuBSSN_r 



  Improvements "no preg clobbers / no search lim":
  1.000: i502.gcc_r 
  
  Regressions "no preg clobbers / no search lim":
  1.006: f526.blender_r 
  1.005: f507.cactuBSSN_r 
  1.005: i505.mcf_r 
  1.004: i523.xalancbmk_r 

These are small variations. Looking at the spill/COPY counts, it is a bit of surprise with 'blender', but generally increased spilling is of course not good:

  main <> "no preg clobbers / no search lim"
  
  f526.blender_r
  Spill|Reload   :                90817                90761      -56
  Copies         :               208906               208506     -400
  
  f507.cactuBSSN_r
  Spill|Reload   :               148023               148036      +13
  Copies         :                59083                59083       +0
  
  i505.mcf_r
  Spill|Reload   :                  702                  706       +4
  Copies         :                  562                  565       +3
  
  i523.xalancbmk_r
  Spill|Reload   :                14979                15029      +50
  Copies         :               134345               134185     -160

So it seems that we may get a few very slight regressions by rewriting shouldCoalesce(). Being a little careful with a search limit (at 50) helps a little, but not much. We could:

- keep it simple and accept the slight regressions above.
- keep the original version with the special case handling for the CDSG loops added, which have intervals spanning multiple blocks.
- constrain this new version further to get results closer to the unmodified version.

It depends a little on what behavior we think would be ideal. Normally coalescing is "good", but coalescing a GR128 live-range with a subreg will make the whole liverange 128bits, which costs a GR64 register. On the other hand, that GR64 subreg usage is typically just a short extension (at least per my old notes from 2017) of the live interval, so it may therefore still be better to avoid a register move.

To me, it seems that we probably want to coalesce away the COPY when it is just a matter of a use a few instructions later, which is the common case. We also want to coalesce the 4 COPYs inside the LSDG loop which result from the PHI lowering. Prolonging the GR128 interval any long distance is probably not worthwhile as it costs a full GR64 register. That is why the search limit is a good idea to me not only for the sake of a worst case compile time issue (which there is not on SPEC).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146425/new/

https://reviews.llvm.org/D146425



More information about the llvm-commits mailing list