[PATCH] D136169: [AMDGPU] Avoid SCC clobbering before S_CSELECT_B32
    Alexander via Phabricator via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Mon Nov 21 14:36:33 PST 2022
    
    
  
alex-t added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:2238
+              .addReg(FrameReg)
+              .addImm(Offset);
+          BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_BITCMP1_B32))
----------------
sebastian-ne wrote:
> Maybe it makes sense to add an assert, that the least significant bit of `Offset` is indeed 0? (here and for s_subb below)
As far as I know, the flat scratch offset is always dword-aligned. If it is not we've got the UB anyway. So, the assert does not help.
================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:2261
+                BuildMI(*MBB, std::next(MI), DL, TII->get(AMDGPU::S_SUBB_U32),
+                        TmpSReg)
+                    .addReg(FrameReg)
----------------
foad wrote:
> The bitcmp1/bitset0 trick only works with addc. It doesn't work with subb, because if scc is set then the result will be 1 *less* than you wanted, so bitset0 will not correct it.
Oops. Good point. 
BTW S_ADDC_U32 with "-Offset" may cause overflow itself. So, we override SCC again.
Although offset has uint64_t, I guess that we always have a flat scratch offset value small enough to never get into this trouble. Just because we subtract the same value as we were adding in the previous step.
And yes, we've got no problem with this so far as we had no tests for that case. I am going to add one.
Repository:
  rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136169/new/
https://reviews.llvm.org/D136169
    
    
More information about the llvm-commits
mailing list