[PATCH] D115401: AMDGPU: Fix clobbering SCC when expanding large offset spill pseudos

Tue Dec 14 04:28:22 PST 2021

sebastian-ne added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:189

+    TmpVGPRUsed = RS->isRegUsed(TmpVGPR);
+    if (TmpVGPRUsed) {
----------------
What is the difference between `TmpVGPRUsed` and `TmpVGPRLive`? It looks like the same thing to me?

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:228
       if (TmpVGPRLive)
-        TRI.buildVGPRSpillLoadStore(*this, TmpVGPRIndex, 0, /*IsLoad*/ false,
-                                    /*IsKill*/ false);
+        TRI.buildVGPRSpillLoadStore(*this, TmpVGPRIndex, 0, /*IsLoad*/ false);
       // Spill inactive lanes
----------------
Isn’t TmpVGPR still used in the second `buildVGPRSpillLoadStore`? How does this work if it’s killed here?

================
Comment at: llvm/test/CodeGen/AMDGPU/swdev309419-agpr.mir:1588-1593
+  ; GFX908-NEXT:   S_CMP_EQ_U32 0, 0, implicit-def $scc
+  ; GFX908-NEXT:   BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
+  ; GFX908-NEXT:   $vgpr0 = V_MOV_B32_e32 8200, implicit $exec
+  ; GFX908-NEXT:   $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec
+  ; GFX908-NEXT:   BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)
+  ; GFX908-NEXT:   $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
----------------
This looks incorrect. It’s trying to use v0 as offset and agpr intermediate at the same time.

================
Comment at: llvm/test/CodeGen/AMDGPU/swdev309419.mir:8
+
+# Test that if we have a spill with a live SCC def, and we the offset
+# does not fit in the immediate offset we do not clobber SCC.
----------------
Typo: we the offset

================
Comment at: llvm/test/CodeGen/AMDGPU/swdev309419.mir:1074-1078
+  ; GFX10-FLATSCR-NEXT:   S_CMP_EQ_U32 0, 0, implicit-def $scc
+  ; GFX10-FLATSCR-NEXT:   $vcc_lo = S_ADD_I32 $sgpr32, 8200, implicit-def $scc
+  ; GFX10-FLATSCR-NEXT:   $vgpr1 = V_MOV_B32_e32 killed $vcc_lo, implicit $exec
+  ; GFX10-FLATSCR-NEXT:   $vgpr0 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
+  ; GFX10-FLATSCR-NEXT:   S_CBRANCH_SCC1 %bb.2, implicit $scc
----------------
I guess this is expected to still incorrectly overwrite $scc?
Maybe add a fixme?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115401/new/

https://reviews.llvm.org/D115401