[PATCH] D136169: [AMDGPU] Avoid SCC clobbering before S_CSELECT_B32

Mon Nov 21 14:36:33 PST 2022

alex-t added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:2238
+              .addReg(FrameReg)
+              .addImm(Offset);
+          BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_BITCMP1_B32))
----------------
sebastian-ne wrote:
> Maybe it makes sense to add an assert, that the least significant bit of `Offset` is indeed 0? (here and for s_subb below)
As far as I know, the flat scratch offset is always dword-aligned. If it is not we've got the UB anyway. So, the assert does not help.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:2261
+                BuildMI(*MBB, std::next(MI), DL, TII->get(AMDGPU::S_SUBB_U32),
+                        TmpSReg)
+                    .addReg(FrameReg)
----------------
foad wrote:
> The bitcmp1/bitset0 trick only works with addc. It doesn't work with subb, because if scc is set then the result will be 1 *less* than you wanted, so bitset0 will not correct it.
Oops. Good point. 
BTW S_ADDC_U32 with "-Offset" may cause overflow itself. So, we override SCC again.
Although offset has uint64_t, I guess that we always have a flat scratch offset value small enough to never get into this trouble. Just because we subtract the same value as we were adding in the previous step.
And yes, we've got no problem with this so far as we had no tests for that case. I am going to add one.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136169/new/

https://reviews.llvm.org/D136169