[PATCH] D146425: [SystemZ] Enable AtomicExpandPass for i128
Jonas Paulsson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 4 02:54:55 PDT 2023
jonpa added a comment.
With full benchmark runs on a few tests I see:
Improvements "3/50":
0.994: i523.xalancbmk_r
Regressions "3/50":
1.005: f526.blender_r
1.004: i502.gcc_r
1.003: i505.mcf_r
1.002: f507.cactuBSSN_r
Improvements "no preg clobbers / no search lim":
1.000: i502.gcc_r
Regressions "no preg clobbers / no search lim":
1.006: f526.blender_r
1.005: f507.cactuBSSN_r
1.005: i505.mcf_r
1.004: i523.xalancbmk_r
These are small variations. Looking at the spill/COPY counts, it is a bit of surprise with 'blender', but generally increased spilling is of course not good:
main <> "no preg clobbers / no search lim"
f526.blender_r
Spill|Reload : 90817 90761 -56
Copies : 208906 208506 -400
f507.cactuBSSN_r
Spill|Reload : 148023 148036 +13
Copies : 59083 59083 +0
i505.mcf_r
Spill|Reload : 702 706 +4
Copies : 562 565 +3
i523.xalancbmk_r
Spill|Reload : 14979 15029 +50
Copies : 134345 134185 -160
So it seems that we may get a few very slight regressions by rewriting shouldCoalesce(). Being a little careful with a search limit (at 50) helps a little, but not much. We could:
- keep it simple and accept the slight regressions above.
- keep the original version with the special case handling for the CDSG loops added, which have intervals spanning multiple blocks.
- constrain this new version further to get results closer to the unmodified version.
It depends a little on what behavior we think would be ideal. Normally coalescing is "good", but coalescing a GR128 live-range with a subreg will make the whole liverange 128bits, which costs a GR64 register. On the other hand, that GR64 subreg usage is typically just a short extension (at least per my old notes from 2017) of the live interval, so it may therefore still be better to avoid a register move.
To me, it seems that we probably want to coalesce away the COPY when it is just a matter of a use a few instructions later, which is the common case. We also want to coalesce the 4 COPYs inside the LSDG loop which result from the PHI lowering. Prolonging the GR128 interval any long distance is probably not worthwhile as it costs a full GR64 register. That is why the search limit is a good idea to me not only for the sake of a worst case compile time issue (which there is not on SPEC).
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D146425/new/
https://reviews.llvm.org/D146425
More information about the llvm-commits
mailing list