[PATCH] D114644: [AMDGPU] Aggressively fold immediates in SIShrinkInstructions

Tue May 17 03:17:27 PDT 2022

foad added reviewers: arsenm, rampitec, nhaehnle, tsymalla, piotr, sebastian-ne.
foad added a comment.

In D114644#3156060 <https://reviews.llvm.org/D114644#3156060>, @foad wrote:

> Some data on the comined effect of D114643 <https://reviews.llvm.org/D114643> + D114644 <https://reviews.llvm.org/D114644>, from statically compiling a corpus of 10320 graphics shaders for gfx1010:
>
> - Total number of instructions decreased from 6071567 to 5999110 (-1.2%)
> - Total number of code bytes increased from 35932468 to 36174540 (+0.67%)
> - Total number of vgprs used decreased from 517395 to 517238 (-0.030%)
> - Total number of sgprs used decreased from 811411 to 805549 (-0.73%)

Redoing this analysis, for gfx900:

- Total number of instructions decreased from 5839766 to 5790517 (-0.84%)
- Total number of code bytes increased from 30480844 to 30727840 (+0.81%)
- Total number of readlane/writelane instructions decreased from 64049 to 62081 (-3.07%)
- Total number of vgprs used increased from 479581 to 479702 (+0.03%)
- Total number of sgprs used decreased from 766214 to 760162 (-0.79%)

For gfx1030:

- Total number of instructions decreased from 6070932 to 6006155 (-1.07%)
- Total number of code bytes increased from 31346752 to 31645184 (+0.95%)
- Total number of readlane/writelane instructions decreased from 58297 to 56368 (-3.31%)
- Total number of vgprs used decreased from 558964 to 558482 (-0.09%)
- Total number of sgprs used decreased from 805633 to 800257 (-0.67%)
- Total number of v_cmpx instructions increased from 26303 to 26579 (+1.05%)

The number of readlane/writelane instructions is an indication of how often sgprs get spilled into vgprs.

The reason for the increase in v_cmpx matching is that sometimes we get sequences like this:

  s_mov_b32 s26, 0x3b23d70a
  ...
  v_cmp_ngt_f32_e32 vcc_lo, s26, v17
  s_and_saveexec_b32 s26, vcc_lo

This can't be converted to use v_cmpx because the uses of s26 would overlap:

  s_mov_b32 s26, 0x3b23d70a
  ...
  s_mov_b32 s26, exec_lo // clobbers s26 !!!
  v_cmpx_ngt_f32_e32 s26, v17

But with the constant folded into the v_cmp instruction, it is fine:

  s_mov_b32 s26, exec_lo
  v_cmpx_ngt_f32_e32 0x3b23d70a, v17

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114644/new/

https://reviews.llvm.org/D114644