[PATCH] D136169: [AMDGPU] Avoid SCC clobbering before S_CSELECT_B32

Alexander via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 19 13:10:32 PDT 2022


alex-t marked an inline comment as done.
alex-t added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:2216-2224
+          while (I != MBB->end()) {
+            if (*I != MI && I->readsRegister(AMDGPU::SCC)) {
+              NeedSaveSCC = true;
+              break;
+            }
+            if (*I != MI && I->definesRegister(AMDGPU::SCC))
+              break;
----------------
foad wrote:
> alex-t wrote:
> > foad wrote:
> > > What is this loop for? Isn't the scavenger already supposed to know whether SCC is live?
> > > What is this loop for? Isn't the scavenger already supposed to know whether SCC is live?
> > 
> > The scavenger's knowledge depends on the accurate dead/kill flags insertion. This loop decreases the number of false-positive "live" SCC and accordingly unnecessary SCC saves/restores.
> > 
> > As Matt rightly pointed out, the dead/kill correct placement is out of this change scope.
> > It also is going to be a complex task.
> > 
> > Should I add a corresponding TODO here?
> Haven't dead/kill flags been automatically computed before this code runs?
> 
> If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of the basic block?
Let me illustrate what I mean. For the code snipped below
```
  (1) $sgpr3 = S_ADD_I32 $sgpr33, 32, implicit-def $scc
  (2) $vgpr51 = V_ADD_U32_e64 killed $sgpr3, killed $sgpr43, 0, implicit $exec
  (3) $sgpr3 = S_ADD_I32 $sgpr33, 32, implicit-def $scc
  (4) $vgpr52 = V_ADD_U32_e64 killed $sgpr3, killed $sgpr44, 0, implicit $exec
```
scavenger considers SCC "live" between (1) and (3) because its internal iterator position points to (2). Calling the "advance" method inside the target hook breaks the outer PEI logic. Literally, PEI does not expect the scavenger state to be changed inside the TargetRegisterInfo::LowerFrameIndex method.

In this particular case, I can add the dead flag to the particular place:
```
auto Add = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), TmpSReg)
            .addReg(FrameReg)
            .addImm(Offset);
Add->getOperand(3).setIsDead();
```
Unfortunately, we have plenty of places like that. And for cases that are not as trivial, we are going to have plenty of odd code.



================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:2216-2224
+          while (I != MBB->end()) {
+            if (*I != MI && I->readsRegister(AMDGPU::SCC)) {
+              NeedSaveSCC = true;
+              break;
+            }
+            if (*I != MI && I->definesRegister(AMDGPU::SCC))
+              break;
----------------
alex-t wrote:
> foad wrote:
> > alex-t wrote:
> > > foad wrote:
> > > > What is this loop for? Isn't the scavenger already supposed to know whether SCC is live?
> > > > What is this loop for? Isn't the scavenger already supposed to know whether SCC is live?
> > > 
> > > The scavenger's knowledge depends on the accurate dead/kill flags insertion. This loop decreases the number of false-positive "live" SCC and accordingly unnecessary SCC saves/restores.
> > > 
> > > As Matt rightly pointed out, the dead/kill correct placement is out of this change scope.
> > > It also is going to be a complex task.
> > > 
> > > Should I add a corresponding TODO here?
> > Haven't dead/kill flags been automatically computed before this code runs?
> > 
> > If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of the basic block?
> Let me illustrate what I mean. For the code snipped below
> ```
>   (1) $sgpr3 = S_ADD_I32 $sgpr33, 32, implicit-def $scc
>   (2) $vgpr51 = V_ADD_U32_e64 killed $sgpr3, killed $sgpr43, 0, implicit $exec
>   (3) $sgpr3 = S_ADD_I32 $sgpr33, 32, implicit-def $scc
>   (4) $vgpr52 = V_ADD_U32_e64 killed $sgpr3, killed $sgpr44, 0, implicit $exec
> ```
> scavenger considers SCC "live" between (1) and (3) because its internal iterator position points to (2). Calling the "advance" method inside the target hook breaks the outer PEI logic. Literally, PEI does not expect the scavenger state to be changed inside the TargetRegisterInfo::LowerFrameIndex method.
> 
> In this particular case, I can add the dead flag to the particular place:
> ```
> auto Add = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), TmpSReg)
>             .addReg(FrameReg)
>             .addImm(Offset);
> Add->getOperand(3).setIsDead();
> ```
> Unfortunately, we have plenty of places like that. And for cases that are not as trivial, we are going to have plenty of odd code.
> 
> If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of the basic block?

Good point, thanks.


================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:2263
+          Register SCCCopy =
+              RS->scavengeRegister(getBoolRC(), std::prev(P), 0, !UseSGPR);
+          if (!SCCCopy)
----------------
foad wrote:
> alex-t wrote:
> > foad wrote:
> > > None of this code should depend on wave size. You don't want getBoolRC here - that's for VCC which might be a register pair in wave64. You can always save SCC in a single 32-bit SGPR.
> > Since back SGPR to SCC copy depends on EXEC/EXEC_LO and I am using the same register for saving and restoring I need to use 64bit unless isWave32. I am not sure if I can use S_AND_B32 reg, exec_lo in wave64 mode?
> You can just use "s_cmp_lg_u32 sgpr, 0" to copy back to SCC.
Given that this this is pure scalar context yes


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136169/new/

https://reviews.llvm.org/D136169



More information about the llvm-commits mailing list