[PATCH] D114644: [AMDGPU] Aggressively fold immediates in SIShrinkInstructions
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue May 17 03:17:27 PDT 2022
foad added reviewers: arsenm, rampitec, nhaehnle, tsymalla, piotr, sebastian-ne.
foad added a comment.
In D114644#3156060 <https://reviews.llvm.org/D114644#3156060>, @foad wrote:
> Some data on the comined effect of D114643 <https://reviews.llvm.org/D114643> + D114644 <https://reviews.llvm.org/D114644>, from statically compiling a corpus of 10320 graphics shaders for gfx1010:
>
> - Total number of instructions decreased from 6071567 to 5999110 (-1.2%)
> - Total number of code bytes increased from 35932468 to 36174540 (+0.67%)
> - Total number of vgprs used decreased from 517395 to 517238 (-0.030%)
> - Total number of sgprs used decreased from 811411 to 805549 (-0.73%)
Redoing this analysis, for gfx900:
- Total number of instructions decreased from 5839766 to 5790517 (-0.84%)
- Total number of code bytes increased from 30480844 to 30727840 (+0.81%)
- Total number of readlane/writelane instructions decreased from 64049 to 62081 (-3.07%)
- Total number of vgprs used increased from 479581 to 479702 (+0.03%)
- Total number of sgprs used decreased from 766214 to 760162 (-0.79%)
For gfx1030:
- Total number of instructions decreased from 6070932 to 6006155 (-1.07%)
- Total number of code bytes increased from 31346752 to 31645184 (+0.95%)
- Total number of readlane/writelane instructions decreased from 58297 to 56368 (-3.31%)
- Total number of vgprs used decreased from 558964 to 558482 (-0.09%)
- Total number of sgprs used decreased from 805633 to 800257 (-0.67%)
- Total number of v_cmpx instructions increased from 26303 to 26579 (+1.05%)
The number of readlane/writelane instructions is an indication of how often sgprs get spilled into vgprs.
The reason for the increase in v_cmpx matching is that sometimes we get sequences like this:
s_mov_b32 s26, 0x3b23d70a
...
v_cmp_ngt_f32_e32 vcc_lo, s26, v17
s_and_saveexec_b32 s26, vcc_lo
This can't be converted to use v_cmpx because the uses of s26 would overlap:
s_mov_b32 s26, 0x3b23d70a
...
s_mov_b32 s26, exec_lo // clobbers s26 !!!
v_cmpx_ngt_f32_e32 s26, v17
But with the constant folded into the v_cmp instruction, it is fine:
s_mov_b32 s26, exec_lo
v_cmpx_ngt_f32_e32 0x3b23d70a, v17
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D114644/new/
https://reviews.llvm.org/D114644
More information about the llvm-commits
mailing list