[PATCH] D136169: [AMDGPU] Avoid SCC clobbering before S_CSELECT_B32

Mon Oct 24 09:20:32 PDT 2022

alex-t marked 2 inline comments as done.
alex-t added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:2390
               BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), ScaledReg)
-                .addReg(ScaledReg, RegState::Kill)
-                .addImm(-Offset);
+                  .addReg(ScaledReg, RegState::Kill)
+                  .addImm(-Offset);
----------------
foad wrote:
> Remove these changes from the patch.
This is obviously a mistake. "-Offset" cannot work here.
Given that we always have Reg and Offset aligned, hence even, we should just use S_SUBB_U32 here.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.h:96
+  unsigned getSCCLivenessScanMaxLength() const {
+     return 100; 
+  }
----------------
rampitec wrote:
> 100 seems to large number to me. Compiling with spills is already sufficiently and painfully slow. I would not go more than 10 here. In a worst case you are now burning just 2 instructions and probably 8 cycles.
> 
> Besides I do not think it deserves a special function, just a constant threshold right inside the code. Just because it is only used once.
> 100 seems to large number to me. Compiling with spills is already sufficiently and painfully slow. I would not go more than 10 here. In a worst case you are now burning just 2 instructions and probably 8 cycles.
> 
> Besides I do not think it deserves a special function, just a constant threshold right inside the code. Just because it is only used once.

I would not agree. Really there may be tens of the odd SCC preserving code sequences in 100 lines length BB if it has frequent stack access.  So, this is not just 2 instructions but 2*N. I can show you the example of assembly if necessary.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136169/new/

https://reviews.llvm.org/D136169